Shared mixed-reality environments responsive to motion-capture data

ABSTRACT

An immersive content presentation system can capture the motion or position of a performer in a real-world environment. A game engine can be modified to receive the position or motion of the performer and identify predetermined gestures or positions that can be used to trigger actions in a 3-D virtual environment, such as generating a digital effect, transitioning virtual assets through an animation graph, adding new objects, and so forth. The use of the 3-D environment can be rendered and composited views can be generated. Information for constructing the composited views can be streamed to numerous display devices in many different physical locations using a customized communication protocol. Multiple real-world performers can interact with virtual objects through the game engine in a shared mixed-reality experience.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application claims the benefit of the following U.S. ProvisionalApplication:

-   -   U.S. Provisional Application No. 62/558,249, filed on Sep. 13,        2017, entitled “REAL-TIME IMMERSIVE CONTENT PRESENTATION        SYSTEM,” by Brickhill et al, which is incorporated herein by        reference.

The following related U.S. Nonprovisional Applications are being filedon the same day as the present application:

-   -   U.S. Nonprovisional application Ser. No. 16/130,240, filed on        Sep. 13, 2018, entitled “REAL-TIME VIEWS OF MIXED-REALITY        ENVIRONMENTS RESPONSIVE TO MOTION-CAPTURE DATA” by Cordes et al,        which is incorporated herein by reference.    -   U.S. Nonprovisional application Ser. No. 16/130,258, filed on        Sep. 13, 2018, entitled “GAME ENGINE RESPONSIVE TO        MOTION-CAPTURE DATA FOR MIXED-REALITY ENVIRONMENTS” by Cordes et        al, which is incorporated herein by reference.    -   U.S. Nonprovisional application Ser. No. 16/130,269, filed on        Sep. 13, 2018, entitled “COMMUNICATION PROTOCOL FOR STREAMING        MIXED-REALITY ENVIRONMENTS BETWEEN MULTIPLE DEVICES” by Cordes        et al, which is incorporated herein by reference.

TECHNICAL FIELD

This application discloses technology related to the fields of computeranimation, virtual reality environments, and digital content generation.Specifically, this application discloses technology for using areal-time gestures to drive computer-generated assets in a multi-deviceviewing environment.

BACKGROUND

Augmented reality includes a live view of a real-world environment thatis augmented by computer generated sensory input(s), such as GPSgraphics, video, sound, data statistics, and so forth. In contrast tovirtual reality, which replaces the real-world environment with asimulated one, augmented reality elements are often displayed in realtime in semantic context with elements of the real-world environment.For example, sports scores can be displayed on a television during abasketball game on a same screen. Headmounted displays can also be usedto place the virtual images over a view of the physical world such thatboth are in the user's field of view.

Virtual reality is very similar to augmented reality in that it entailsan interactive computer generated experience. However, virtual realitytakes place within a simulated, immersive environment. Current virtualreality technology commonly uses virtual-reality headset ormulti-projected environments to create realistic sensations, sales, andimages that simulate the user having a physical presence in the virtualor imaginary environment. This effect is usually implemented using avirtual-reality headset having a head-mounted display comprising a smallscreen positioned in front of the eyes. A multi-projected environment,virtual-reality may be implemented using multiple large projectionscreens that surround the user.

Digital artists have long sought to integrate human performancestogether with animated CGI performances. This has been accomplishedmainly in a production environment where scripted human sequences can becomposited with pre-animated CGI performances. The combination of theseelements can be displayed in the same video sequence to give theillusion that the CGI character and the human character coexist in thescene. However, making interactions between the CGI character and thehuman character appear seamless and realistic requires rehearsal andextensive planning and timing considerations. This is particularly truefor physical interactions between a human actor and elements orcharacters in a virtual environment. For example, to realisticallysimulate a human actor lifting a digital character, the timing,placement, motion sequence, and speed of the human actions must bepre-scripted and built into the pre-animated sequence of the digitalcharacter. In other words, the human character has to follow the lead ofthe digital character. Any deviation from the pre-planned sequence willcreate visual and/or auditory discontinuities in the presentation thatare immediately apparent to audiences. These discontinuities destroy thefeeling that producers seek to create in a virtual/augmented realityenvironment.

BRIEF SUMMARY

In some embodiments, a method may include receiving a first motion orposition of a first performer in a first real-world environment;identifying the first motion or position as a first predefined motion orposition; altering a virtual asset in a 3-D virtual environment inresponse to identifying the first motion or position as the firstpredefined motion or position; receiving a second real-time motion orposition of a second performer in a second real-world environment;identifying the second real-time motion or position as a secondpredefined motion or position; and altering the virtual asset in the 3-Dvirtual environment in response to identifying the second motion orposition as the second predefined motion or position.

In some embodiments, an immersive content presentation system mayinclude one or more processors and one or more memory devices includinginstructions that, when executed by the one or more processors, causethe one or more processors to perform operations including receiving afirst motion or position of a first performer in a first real-worldenvironment; identifying the first motion or position as a firstpredefined motion or position; altering a virtual asset in a 3-D virtualenvironment in response to identifying the first motion or position asthe first predefined motion or position; receiving a second real-timemotion or position of a second performer in a second real-worldenvironment; identifying the second real-time motion or position as asecond predefined motion or position; and altering the virtual asset inthe 3-D virtual environment in response to identifying the second motionor position as the second predefined motion or position.

In some embodiments, a non-transitory, computer-readable medium mayinclude instructions that, when executed by one or more processors,cause the one or more processors to perform operations includingreceiving a first motion or position of a first performer in a firstreal-world environment; identifying the first motion or position as afirst predefined motion or position; altering a virtual asset in a 3-Dvirtual environment in response to identifying the first motion orposition as the first predefined motion or position; receiving a secondreal-time motion or position of a second performer in a secondreal-world environment; identifying the second real-time motion orposition as a second predefined motion or position; and altering thevirtual asset in the 3-D virtual environment in response to identifyingthe second motion or position as the second predefined motion orposition.

In any embodiments, any or all of the following features may be includedin any combination and without limitation. The method/operations mayalso include rendering a 2-D video stream of the virtual asset in the3-D virtual environment, where the 2-D video stream of the virtual assetmay include the altering of the virtual asset in response to identifyingthe first motion or position, and the 2-D video stream of the virtualasset may include the altering of the virtual asset in response toidentifying the second motion or position. The method/operations mayfurther include causing a real-time video stream to be displayed on adisplay device, where the real-time video stream may include the 2-Dvideo stream of the virtual asset, and the 2-D video stream may becomposited with a real-time view of a second real-world environment. Thefirst real-world environment may be physically separated from the secondreal-world environment. The first performer need not be visible to thesecond performer in the first real-world environment. The firstperformer may be visible to the second performer through the 3-D virtualenvironment. The first performer may be equipped with a pair of ARglasses. Altering the virtual asset in the 3-D virtual environment inresponse to identifying the first motion or position as the firstpredefined motion or position may include causing the virtual asset tobe held by virtual representation of the first performer in the 3-Dvirtual environment. Altering the virtual asset in the 3-D virtualenvironment in response to identifying the second motion or position asthe second predefined motion or position may include causing the virtualasset to be passed from the virtual representation of the firstperformer in the 3-D virtual environment to a virtual representation ofthe second performer in the 3-D virtual environment. The firstpredefined motion or position may include the first performer closingtheir hand. The first predefined motion or position may include thefirst performer pointing their arm or hand at a virtual representationof an object. The first predefined motion or position may include thefirst performer executing a throwing motion. The first predefined motionor position may include the second performer executing a catchingmotion. The virtual asset may include a digital effect. The virtualasset may include a CGI character. The first performer in the firstreal-world environment may include a motion-capture suit that isrecorded with a plurality of motion-capture cameras in a motion-capturesystem. Identifying the first motion or position as the first predefinedmotion or position may include receiving a motion-capture frame at agame engine, and comparing the first motion or position to a pluralityof motions or positions in a predefined library of motions or positions.Identifying the first motion or position as the first predefined motionor position may include comparing calculated motion vectors of verticeson the motion-capture frame with the predefined library of motions orpositions.

BRIEF DESCRIPTION OF THE DRAWINGS

A further understanding of the nature and advantages of the presentinvention may be realized by reference to the remaining portions of thespecification and the drawings, wherein like reference numerals are usedthroughout the several drawings to refer to similar components.

In some instances, a sub-label is associated with a reference numeral todenote one of multiple similar components. When reference is made to areference numeral without specification to an existing sub-label, it isintended to refer to all such multiple similar components.

FIG. 1 illustrates a first performance area where images and positionsof characters and props may be captured by the presentation system.

FIG. 2A illustrates a depth image that may be captured by the one ormore depth cameras.

FIG. 2B illustrates a 3-D virtual model of the first performance areausing predefined 3-D objects, according to some embodiments.

FIG. 3 illustrates a second performance area for a motion-captureperformer to drive the visible characteristics of one or more virtualassets, according to some embodiments.

FIG. 4 illustrates a resulting motion-capture output derived from theimages captured by the motion-capture cameras, according to someembodiments.

FIG. 5 illustrates a view of a 3-D environment that includes virtualassets that are generated and/or controlled by movements of themotion-capture frame, according to some embodiments.

FIG. 6A illustrates a view of the 3-D environment that incorporateselements based on the first performance area, according to someembodiments.

FIG. 6B illustrates a view of the 3-D environment that incorporateselements based on the first performance area, according to someembodiments.

FIG. 7 illustrates a rendered 2-D image of the 3-D environment before itis composited for display, according to some embodiments.

FIG. 8 illustrates a view of a composited view that combines an image ofthe first performance area with the rendered 2-D image of the 3-Denvironment, according to some embodiments.

FIG. 9 illustrates a display of a composited view on a pair of ARglasses, according to some embodiments.

FIG. 10 illustrates an embodiment that displays a composited view on amobile device equipped with a camera and a display screen, according tosome embodiments.

FIG. 11 illustrates the first performance area combined with the secondperformance area 302 in the same physical space, according to someembodiments.

FIG. 12 illustrates a composited view of the first performance area asdisplayed on a display screen.

FIG. 13 illustrates a flowchart of a method for generating an immersiveexperience that mixes real-world and virtual-world content, according tosome embodiments.

FIG. 14A illustrates a block diagram of an immersive contentpresentation system, according to some embodiments.

FIG. 14B illustrates a block diagram of an alternate arrangement of theelements of the immersive content presentation system, according to someembodiments.

FIG. 14C illustrates a block diagram of an alternate arrangement of theelements of the immersive content presentation system, according to someembodiments.

FIG. 15A illustrates an example of a motion sequence executed by amotion-capture performer that may be used to trigger actions by a gameengine, according to some embodiments.

FIG. 15B illustrates a detailed view of portions of the motion-captureframe from the first pose and the second pose to demonstrate how thegame engine can identify a predefined motion, according to someembodiments.

FIG. 15C illustrates an example of virtual effects that can be triggeredbased on identifying predefined motions and/or positions, according tosome embodiments.

FIG. 16A illustrates how a motion-capture frame can interact withexisting virtual objects in the 3-D virtual environment, according tosome embodiments.

FIG. 16B illustrates a position that may be recognized by the gameengine, according to some embodiments.

FIG. 16C illustrates how subsequent motions of the motion-capture framecan generate actions by the game engine in the 3-D virtual environment,according to some embodiments.

FIG. 16D illustrates how the motion-capture frame can become uncoupledfrom a virtual object, according to some embodiments.

FIG. 17A illustrates how the actions of others CGI characters in the 3-Dvirtual scene can be governed by the motion and/or position of amotion-capture actor, according to some embodiments.

FIG. 17B illustrates an identified motion of the motion-capture framethat changes the state in the animation graph of each of the CGIcharacters, according to some embodiments.

FIG. 18 illustrates a flowchart of a method for governing virtualanimations by identifying predefined motions/positions of amotion-capture performer, according to some embodiments.

FIG. 19 illustrates one example of a multi-venue distribution of theimmersive content presentation system, according to some embodiments.

FIG. 20 illustrates a composited view of the second stage with arendered image of the re-skinned motion-capture performer, according tosome embodiments.

FIG. 21 illustrates how two human performers can control elements of the3-D environment through predetermined motions and/or positions,according to some embodiments.

FIG. 22 illustrates a view of the scene from FIG. 21 from theperspective of a member of the audience, according to some embodiments.

FIG. 23 illustrates a flowchart of a method for providing ashared-reality experience, according to some embodiments.

FIG. 24 illustrates a diagram of the communication protocol, accordingto some embodiments.

FIG. 25 illustrates a transmission using the communication protocol thatsupports remote rendering, according to some embodiments.

FIG. 26 illustrates a transmission using the communication protocol thatsupports centralized rendering, according to some embodiments.

FIG. 27 illustrates a transmission using the communication protocol thatsupports remote rendering/compositing and image capture, according tosome embodiments.

FIG. 28 illustrates a transmission using the communication protocol thatsupports central rendering and remote compositing, according to someembodiments.

FIG. 29 illustrates a flowchart of a method for using a communicationprotocol that efficiently shares information in a shared-realityimmersive content presentation system, according to some embodiments.

FIG. 30 illustrates a computer system, in which various embodimentsdescribed herein may be implemented.

DETAILED DESCRIPTION

The embodiments described herein use real-time technology, such asreal-time location determination, real-time motion capture, real-timeperformance capture, real-time simulation, and real-time rendering toplace computer-generated visual effects into a live performance area.Instead of using precomputed visual effects that require renderedcontent off-line to be synced and choreographed to a live performanceaccording to timing cues, the embodiments described herein use a liveperformance of a human subject to drive the computer-generated imagery(CGI) that is composited in real-time onto a display device. Rather thanhaving the performer chase and/or react to CGI effects, theseembodiments allow the performer to control and orchestrate a liveperformance and trigger various computer-generated visual effects.

Digital artists have long sought to integrate human performancestogether with animated CGI performances. This has been accomplishedmainly in a production environment where scripted human sequences can becomposited with pre-animated CGI performances. The combination of theseelements can be displayed in the same video sequence to give theillusion that the CGI character and the human character coexist in thescene. However, making interactions between the CGI character and thehuman character appear seamless and realistic requires rehearsal andextensive planning and timing considerations. This is particularly truefor interactions between a human actor and elements or characters in avirtual environment. For example, to realistically simulate a humanactor lifting a digital character, the timing, placement, motionsequence, and speed of the human actions must be pre-scripted and builtinto the pre-animated sequence of the digital character. In other words,the human character has to meticulously follow the lead of the digitalcharacter. Any deviation from the pre-planned sequence will createvisual and/or auditory discontinuities in the presentation that areimmediately apparent to audiences. These discontinuities destroy thefeeling that producers seek to create in a shared virtual/augmentedreality environment.

The embodiments described herein overcome these technical challenges.Instead of forcing the human actor to follow the lead of the digitalanimation, these embodiments allow the human actor to drive theperformance improvisationally and in real time. Instead of following theperformance of a digital character, the human performer can insteaddrive the performance of the digital character based on their motion andposition cues. In essence, the performer is able to control andorchestrate the live performance of the digital assets that are part ofa corresponding virtual 3-D scene. These digital assets may includecharacter animations, props, background environments, visual effects,and other elements available in virtual 3-D scenes.

The embodiments described herein are geared towards both the homeenvironment and the public performance environment. These embodimentscan be used to enhance a gaming or home theater experience that includesa television screen, an AR/VR headset, a mobile phone, and/or acomputer. These embodiments can also be used to enhance a publicperformance that includes a plurality of AR/AVR headsets, one or moreperformance stages located in different physical areas, a computerserver, a plurality of mobile computing devices, and/or a motion-capturesystem. This disclosure will first describe a performance scenario thatcan also be duplicated any home environment where a human performertriggers the generation and/or behavior of a digital asset provided toone or more viewers as a composited image. This disclosure will thendescribe game engine technology that has been enhanced by theseembodiments to recognize motion and/or positions from a motion-capturesystem to transition/blend between animation states for animatedcharacters and/or objects. This disclosure will then describe a newtransmission protocol that enables both the large-audience performancescenarios and the home theatre/gaming experience. This disclosurefinally describes how these technologies can be combined to create areal-time live performance that mixes human and CGI assets across anumber of different physical performance areas.

Embodiments are directed at an immersive content presentation system.For example, immersive content (e.g., virtual reality content, mixedreality content, augmented reality content, etc.) may be presented to auser wearing or holding an immersive device (e.g., virtual reality [V/R]goggles, augmented reality [A/R] glasses, tablets, smartphones, etc.).As described herein, the real-time immersive content presentation systemmay also be referred to as simply the content presentation system orpresentation system.

In some embodiments, the presentation system may manage the integrationof content from one or more sources and presents the integrated contentto one or more users who are using or wearing immersive devices. In oneembodiment, an immersive device may be a pair of A/R glasses. In such aninstance, content may be displayed over a display of the A/R glasses andintegrated with the physical environment viewable through translucentportions of the A/R glasses display. In another aspect, an immersivedevice may be a set of virtual reality goggles, a mobile device, or acomputing device (e.g., laptop computer, tablet device, smartphone,smart television, etc.). In some embodiments, a camera of the virtualreality goggles, mobile device, or computing device may capture images(e.g., video frames) of the surrounding physical environment. The imagesmay be integrated with content obtained by the presentation system. Theresulting integrated images may then be presented to a user via adisplay of the virtual reality goggles, computing device, or mobiledevice.

FIG. 1 illustrates a first performance area 102 where images andpositions of characters and props may be captured by the presentationsystem. In some embodiments, the first performance area 102 may be setup in front of a live audience. Although not shown explicitly in FIG. 1,the live audience may surround a portion of the first performance area102. The first performance area 102 may be characterized as a real-worldperformance area having an actual human character 108, such as an actoror audience member. The first performance area 102 may also includephysical props, such as a table 110, a chair 112, a door 116, a window114, and so forth. The first performance area 102 does not necessarilyrequire any enhancements or additional material to make the firstperformance area 102 suitable for inclusion in the immersive contentpresentation system. For example, the first performance area 102 doesnot require any motion-capture fiducials, any motion-capture suits to beworn by the human character 108, or any “green screen” chroma-keybackgrounds. Instead, the first performance area 102 may be configuredusing ordinary props, actors, scenery, lighting, and so forth.

In some embodiments, the human character 108 need not wear any specialclothing or devices to interact with the immersive content presentationsystem. However, in some embodiments, the human character 108 may useone or more immersive devices to interact with immersive content. Forexample, the human character 108 may wear a pair of VR goggles or a pairof AR glasses. As described below, this may allow the human character108 to see additional immersive content that is composited with theirview of the first performance area 102. In some embodiments, the humancharacter 108 may use a mobile computing device, such as a smart phoneor tablet computer. The human character 108 can view portions of thefirst performance area 102 through a camera of the mobile computingdevice to see portions of the first performance area 102 on the screenof the mobile computing device. Thus, the human character 108 can viewportions of the first performance area 102 through the “window” of themobile computing device. When viewed in this manner, the immersivecontent presentation system can composite additional immersive content,such as virtual assets, virtual props, virtual characters, CGI effects,and other computer-generated images onto the view of the firstperformance area 102 as seen by the human character 108. In someembodiments, the human character 108 may also wear motion-capture items,such as visual fiducials, motion-capture suits, and otherdevices/clothing that facilitate a motion-capture system as describedbelow.

Some embodiments of the first performance area 102 may include avisible-light camera 106. In the example of FIG. 1, the firstperformance environment 102 includes a single visible-light camera 106.The visible-light camera 106 may capture a view of the first performancearea 102 from a single perspective. In some embodiments, the camera 106can be stationary, while in other embodiments, the camera 106 can bemounted to a track and movable during the performance. As will bedescribed below, the images in a video sequence captured by the camera106 may serve as a background for a compositing operation. Specifically,any virtual assets that are rendered for display can be composited ontop of a stream of images captured by the camera 106.

In some embodiments, the camera 106 may be supplemented or replaced by aplurality of visible-light cameras distributed around the firstperformance area 102. For example, the camera 106 may be supplementedand/or replaced by a plurality of mobile computing devices that are heldby audience members surrounding the first performance area 102. As willbe described below, each of these mobile computing devices may captureimages of the first performance area 102 from a perspective of eachaudience member. Any virtual assets that are added to the correspondingvirtual scene can then be rendered from the point of view of eachindividual mobile computing device and displayed such that each user hasa unique view of the first performance area 102.

In some embodiments, the camera 106 may be replaced or supplemented byone or more augmented reality devices. For example, one or more audiencemembers surrounding the first performance area 102 may wear pairs ofaugmented reality glasses. Instead of compositing rendered virtualassets on top of a two-dimensional image of the first performance area102 captured by a visible-light camera, these embodiments canalternatively display the rendered virtual assets on the AR glasses ontop of a natural view of the first performance area 102 by each audiencemember.

In some embodiments, the first performance area 102 may be equipped withone or more depth cameras 104. The depth cameras 104 may comprise amotion-sensing input device with a depth sensor. The depth sensor mayinclude a monochrome CMOS sensor and infrared projector. The infraredprojector can project infrared light throughout the first performancearea 102, and the sensor can measure the distance of each point ofreflected IR radiation in the first performance area 102 by measuring atime it takes for the emitted infrared light to return to the sensor.Software in the depth cameras 104 can process the IR informationreceived from the depth sensor and use an artificial intelligencemachine-learning algorithm to map the visual data and create 3-D depthmodels of solid objects in the first performance area 102. For example,the one or more depth cameras 104 can receive emitted IR radiation togenerate 3-D depth models of the human character 108, the table 110, thechair 112, the door 116, the window 114, along with the floor, walls,and/or ceiling of the first performance area 102. In one testembodiment, the first performance area 102 was surrounded by six toeight Kinect® cameras to capture depth information of objects andperformers in the first performance area 102.

FIG. 2A illustrates a depth image that may be captured by the one ormore depth cameras 104. Each individual depth camera 104 measures adepth of objects in relation to the camera 104 itself. However, using aray-tracing algorithm, a plurality of simultaneous depth images can becombined to generate a 3-D depth model of the first performance area102. This example illustrates how the 3-D depth model of the firstperformance area 102 may look. FIG. 2A illustrates a 3-D virtual scenethat is constructed to approximate the first performance area 102. The3-D virtual scene may be loaded or generated in a standard package of3-D modeling software. Objects that are created by the 3-D depth modelof the first performance area 102 can be used to generate 3-D virtualobjects in the 3-D virtual scene. For example, an object 220 can becreated as a volumetric model in the 3-D virtual scene representing thetable 110 from the first performance area 102. Similarly, an object 222can be created as a volumetric model in the 3-D virtual scenerepresenting the chair 112 from the first performance area 102.

In addition to creating volumetric models representing real-worldobjects in the first performance area 102, the immersive contentpresentation system may also generate real-time volumetric models ofcharacters and other movable objects in the first performance area 102.For example, an object 228 can be generated to represent the humancharacter 108 from the first performance area 102. As the humancharacter 108 moves around the first performance area 102, the depthcameras 104 can capture depth images of her movement at interactiveframe rates, such as 30 frames-per-second (fps), 50 fps, 25 fps, 20 fps,15 fps, 10 fps, and/or the like. Each set of images capturedsimultaneously by the depth cameras 104 can be combined to generate areal-time 3-D volumetric model that can be used to generate the object228 representing the human character 108. This system provides theadvantage that any performance area can be modeled and re-created inreal-time in a virtual 3-D environment without pre-existing knowledge ofobjects, characters, and/or arrangement of each performance area.Instead, the depth cameras 104 can create a real-time model of movingand stationary objects in the performance area that can be used in a 3-Dvirtual scene before adding additional virtual assets as describedbelow.

FIG. 2B illustrates a 3-D virtual model of the first performance area102 using predefined 3-D objects, according to some embodiments. Someembodiments may not require the depth cameras 104 illustrated in FIG. 1.Instead, pre-existing 3-D models of objects in the first performancearea 102 can be inserted in the 3-D virtual scene 202. If the physicalconfiguration of the first performance area 102 is known beforehand, the3-D virtual scene 202 can be constructed from pre-existing digitalmodels before the performance takes place. A model 210 of a table can beused to represent the table 110 from the first performance area 102, andthe model 210 can be placed in a position and orientation that matchesthe physical table 110 in the first performance area. Similarly, a model212 representing a chair can be placed in a position and orientationthat matches the physical chair 112 in the first performance area 102.

Some embodiments may adapt to choreographed changes in the firstperformance area 102. For example, if the chair 112 in the firstperformance area 102 changes from an upright position to a recliningposition at a certain time during the performance, the model 212representing the chair in the 3-D virtual scene 202 can be programmed toalso transition to a reclining position at the same time. Alternatively,the immersive content presentation system may accept human inputs inreal time during the performance to manipulate objects in the 3-Dvirtual scene 202. For example, if the human character 108 moves thechair 112 into a reclining position in the first performance area 102, ahuman operator can provide an input to the immersive contentpresentation system that changes the position of the model 212representing the chair 112 such that objects in the 3-D virtual scene202 reflect changes made to the first performance area 102 in real time.

In the example of FIG. 2B, the 3-D virtual scene 202 does not show amodel for the human character 108. Some models may be omitted from the3-D virtual scene if they lack a predefined model in a content libraryor if they move dynamically in the first performance area 102 without apredefined, choreographed motion pattern. For example, if the humancharacter 108 is free to move about the first performance area 102, astationary model in the 3-D virtual scene may not be required. However,some embodiments may place a model that approximates the form of thehuman character 108 in the 3-D virtual scene 102. A human operator canmove the model around the 3-D virtual scene 202 in real time to matchthe movements of the human character 108 in the first performance area102. This may be useful in embodiments where the human character 108interacts with virtual characters or objects that are added to the 3-Dvirtual scene 202 as described below.

Some embodiments may combine the concepts of FIG. 2A and FIG. 2B. Forexample, the volumetric object 220 from FIG. 2A may be generated byvirtue of the depth cameras 104 distributed about the first performancearea 102. As illustrated in FIG. 2B, these volumetric objects lose thetexture, color, and other fine surface details of the objects they aremodeling. However, the volumetric characteristics of the volumetricobject 220 can be used to select a predefined digital model of the table110 from a pre-existing library of objects. For example, the volumetriccharacteristics of the object 220 can be compared to volumes ofpre-existing tables. The immersive content presentation system can thenselect a table from a content library that is most similar to the volumeof the table 110. The volumetric objects 220, 222 can then be replacedwith the 3-D models 210, 212 in the 3-D virtual scene 202. This alsoallows new objects to be added to the 3-D virtual scene 202 during theperformance. For example, if a new human character enters the firstperformance area 102, the volume of the new human character asapproximated by the depth cameras 104 can be used to select a digitalmodel of a human character from a content library that is the same orsimilar to the new human character in the first performance area. As thenew human character moves throughout the first performance area 102, themodel of the new human character selected from the content library canbe moved throughout the 3-D virtual scene 202 based on the motion of thevolume detected by the depth cameras 104. This provides a real-time 3-Dvirtual scene 202 that is nearly identical to the physical scene in thefirst performance area 102.

In addition to capturing a real-time visible-light video of the firstperformance area 102 and generating a 3-D virtual scene 202 thatcorresponds in real-time to the objects and movements in the firstperformance area 102, the immersive content presentation system caninsert additional virtual assets into the 3-D virtual scene 202 that canthen be rendered and displayed for viewers of the first performance area102 in real time. A human actor can be used to drive the animation,selection, placement, and/or other visible characteristics of anyvirtual asset added to the 3-D virtual scene 202. As described below,the motion and/or position of the human actor can be used as a triggerinput to a game engine that inserts or alters the virtual assets intothe 3-D virtual scene 202 in real time.

In some embodiments, the content presentation system may capture aperformance being performed by a performer in a second performance area.The performance area may be, for example, a movie/television set, astage, a stadium, a park, etc. During the performance, the contentpresentation system may detect the motion and/or positioning of theperformer. Such detection may be based on markers or sensors worn by theperformer, depth and/or other motion detection sensors of the contentpresentation system, motion-capture cameras, and/or the like. Forexample, an array of depth sensors may be positioned in proximity to anddirected at the performance area, such as surrounding the perimeter ofthe performance area. In some embodiments, the depth sensors measure thedepth of different parts of the performer in the performance area overthe duration of a performance. The depth information may then be storedand used by the content presentation system to determine the positioningof the performer over the performance.

FIG. 3 illustrates a second performance area 302 for a motion-captureperformer 308 to drive the visible characteristics of one or morevirtual assets, according to some embodiments. The second performancearea 302 may include a plurality of motion-capture cameras 304 that arepart of a motion-capture system. For example, one test embodiment hasused a VICON® motion-capture system that includes the plurality ofmotion-capture cameras 304 and a server process that processes themotion information captured by the motion-capture cameras 304. Themotion-capture performer 308 may wear special clothing and/or objectsfor the motion-capture process. For example, some motion-capture systemsuse balls that are coated with reflective tape or other reflectivesurfaces. These can be attached (e.g., velcroed) to a bodysuit made ofspandex or lycra worn by the motion-capture performer 308. Other typesof visual fiducials may be used as well to track the motion of themotion-capture performer 308, such as barcodes, QR codes, lights,crosses, and/or other visual patterns or geometries.

In some embodiments, the second performance area 302 may be within viewof the first performance area 102. For example, the motion-capture actor308 may be able to see the first performance area 102, including thehuman character 108, the table 110, and so forth. Similarly, the humancharacter 108 may be able to see the second performance area 302including the motion-capture performer 308. This allows the humancharacter 108 to see the movements and/or hear the voice of themotion-capture actor 308 and respond accordingly. For example, when themotion-capture actor 308 begins making a juggling motion, the humancharacter 108 can applaud in reaction because the human character 108can see the actions of the motion-capture actor 308. Thus, even thoughthe first performance area 102 may be physically separated from thesecond performance area 302, humans in either performance area 102, 302can interact with each other as though they share the same performancearea. This allows the CGI character representing the motion-captureactor 308 to be inserted into a view of the first performance area 102while the human character 108 interacts with the CGI character in arealistic, un-choreographed fashion.

In some embodiments, the first performance area 102 may be remotelylocated away from the second performance area 302. For example, thefirst performance area 102 may be located in a first building orstructure while the second performance area 302 is located in a secondbuilding or structure. The first performance area 102 may be remotelylocated such that it is a distance of greater than 1 mile away from thesecond performance area 302. The first performance area 102 may belocated such that it is not visible to the motion-capture performer 308,and such that the second performance area 302 is not visible to thehuman character 108. In this configuration, the second performance area302 may be equipped with a camera that captures a real-time video of themotion-capture performer 308 and displays this video in real time on adisplay screen that is visible to the human character 108 in the firstperformance area 102. Similarly, the first performance area 102 maybroadcast video captured by the camera 106 to a display screen that isvisible to the motion-capture performer 308 in the second performancearea 302 in real time. This allows an un-choreographed, interactiveperformance between the human character 108 and a CGI characterrepresented by the motion-capture actor 308 even though the twoperformance areas 102, 302 may be physically separated by largedistances.

In some embodiments, the first performance area 102 may be combined withthe second performance area 302 into the same physical performance area.For example, the combined performance area may include the depth cameras104, the camera 106, and/or the motion-capture cameras 304. The combinedperformance area may also include the human character 108, the table110, the chair 112, and the motion-capture performer 308 in the sameset. The depth cameras 104 can still capture a depth image of thecombined set as described above, and the motion-capture cameras 304 canstill generate the motion-capture data for the motion-capture performer308. As described below, a view of the combined performance area on adisplay device, such as a display screen, VR goggles, AR glasses, amobile computing device, etc., can “re-skin” the motion-captureperformer 308 with an alternate CGI representation such that the view ofthe motion-capture performer 308 is replaced by the CGI representationin the composited view of the combined performance area described below.

FIG. 4 illustrates a resulting motion-capture output derived from theimages captured by the motion-capture cameras 304, according to someembodiments. The motion-capture output can be used to generate arepresentation of the motion-capture performer 308 in the secondperformance area 302. In this example, each of the visual fiducials usedto track the motion of the motion-capture performer 308 can berepresented by a vertex, and each of the vertices can be connected torepresent a skeleton or wireframe of the motion-capture performer 308.As depicted in FIG. 4, a 3-D representation of the motion-capture frame404 can be created in a 3-D environment 404. The visual fiducials on themotion-capture suit of the motion-capture performer 308 are tracked inreal time by the motion-capture system, and the vertices of themotion-capture frame 404 can move accordingly in real time in the 3-Denvironment 404 to mimic exactly the motions of the motion-captureperformer 308. For example, when the motion-capture performer 308 beginsmaking a juggling motion with his hands and arms, the frame 404 maybegin moving its “hands and arms” in a corresponding fashion.

As will be described in greater detail below, the motions of the frame404 can be translated into a set of inputs for a game engine. A gameengine comprises a software development and execution environmentdesigned to support a videogame or virtual-reality experience. A gameengine may include several core functional blocks comprising a renderingengine for 2-D or 3-D graphics, a physics engine including collisiondetection and/or response, a sound engine, scripting pipelines,animation sequencing, artificial intelligence, network communications,memory management, process threading, scene graph support, and videoprocessing. Most relevant to the embodiments described herein, a gameengine may provide a physics system that simulates physical interactionsbetween virtual objects based on programmable physics rules. This caninclude collisions, friction, reactions, and/or other inertialcharacteristics of physical objects modeled in the defined physicsenvironment. For example, a game engine can receive inputs that wouldaffect or apply a simulated force to a virtual object, and the gameengine can output resulting movements of the affected object.

In some embodiments, an existing game engine can be modified to includecode that recognizes predefined movements of the frame 404 and triggersthe insertion and/or motion of predefined virtual assets in a 3-Denvironment accordingly. FIG. 5 illustrates a view of a 3-D environment502 that includes virtual assets that are generated and/or controlled bymovements of the motion-capture frame, according to some embodiments. Inthis example, the immersive content presentation system can cause theframe 404 representing the position and/or movements of themotion-capture performer 308 to be “re-skined,” which may include addingadditional textures and/or volumetric features to the representation ofthe frame 404 to provide a different appearance. For example, in FIG. 5,the frame 404 has been replaced with a 3-D model of a clown 506. In someembodiments, the 3-D model of the clown 506 may include a full 3-Dcharacter model of a clown character with control points that are linkedto the vertices on the frame 404. The movements of the vertices of theframe 404 may then drive corresponding movements of the 3-D model of theclown 506. The location of the 3-D model of the clown 506 can be placedin the 3-D environment 502 in a location that corresponds to thelocation of the motion-capture performer 308 in the second performancearea 302.

The 3-D model of the clown 506 may be considered a “virtual asset” thatis generated, added to the 3-D environment 502, and motion controlled bythe motion and/or position of the motion-capture performer 308. Forexample, the 3-D model of the clown 506 can be inserted into the 3-Denvironment 502 when the motion-capture performer 308 reaches a certainposition in the second performance area 302. When the motion-captureperformer 308 walks to the center of the second performance area 302,the 3-D model of the clown 506 can be generated in the 3-D environment502. In another implementation, the 3-D of model of the clown 506 can beinserted into the 3-D environment 502 whenever the motion-captureperformer 308 is visible to the motion-capture cameras 304 in the secondperformance area 302. Similarly, when the motion-capture performer 308moves to a predefined position or executes a predefined gesture/action,the 3-D model of the clown 506 can be removed from the 3-D environment502. Thus, not only do the movements of the motion-capture performer 308drive the movements of the 3-D model of the clown 506, but the positionand/or movements of the motion-capture performer 308 may also triggerthe generation/deletion of the 3-D model of the clown 506 from the 3-Denvironment 502. This functionality may be provided through the gameengine, which has been modified to recognize predefined positions and/ormovements of the frame 404 from the motion-capture system and triggerthe generation, deletion, and/or motion of virtual assets in response.

The 3-D model of the clown 506 is used only as an example and is notmeant to be limiting. In other embodiments, more elaborate andfantastical re-skins can be applied to the frame 404. For example, someembodiments use the motion of the frame 404 to drive humanoid or roboticcharacters, animal characters, human-animal hybrid characters,dinosaurs, alien creatures, objects, machinery, and/or any other virtualasset. Thus the frame 404 can be replaced or re-skinned with any virtualasset, and the motion of the vertices of the frame 404 can be used todrive the corresponding motion, if any, of the virtual asset.

In addition to using the recognition of predefined positions and/ormovements of the frame 404 to create a “re-skin” for the frame 404, thegame engine can also recognize predefined positions and/or movements ofthe frame to generate visual effects and/or other visual assets. In theexample of FIG. 5, the back-and-forth motion of the “hands” of the frame404 would correspond to the juggling motion made by the motion-captureperformer 308 in the second performance area 302. This back-and-forthmotion of the vertices of the frame 404 can match a predefined movementpattern of those vertices that is recognized by the game engine. Inresponse, the game engine can insert one or more digital effects intothe 3-D environment 502. The game engine can insert 3-D models ofobjects that the 3-D model of the clown 506 can appear to be juggling,such as the 3-D models of the flaming balls 504 depicted in FIG. 5.

Other visual effects may include generating mist or fog, causinglightning to strike, creating magic effects, such as electricity beingemitted from a character's fingertips, generating fire or ice, changingthe lighting of the 3-D environment 502, moving other virtual assets inthe scene, and so forth. The following non-limiting list of examplesillustrates some of the many conceivable virtual assets or effects thatcan be generated in the 3-D environment 502 in response to recognizingpredefined motions of the frame 404. A character may slowly raise andlower their arms to increase or decrease the lighting in the 3-Denvironment 502. A character may point at an object and cause the objectto lift off the ground as though by magic—the object may be purelyvirtual and visible in the view provided to an audience, or the objectmay be a physical object that is made to appear to move by compositing arendered version of the object moving on top of the visible object tothe audience. A character may perform a predefined finger gesture tocause lightning, fire, ice, or other substances to be emitted from theirhands or fingertips. A character may lift a virtual object or cause itto shrink or grow in size by “squeezing” it between their arms or hands.A character may perform a predefined gesture that causes fireworks,fire, lightning, explosions, and/or other sudden occurrences. Themodifications made to the game engine for these embodiments can beprogrammed to perform these and many other digital effects based on therecognition of predefined motions, gestures, and/or positions of theframe 404.

In addition to re-skinning the frame 404 with the 3-D model of the clown506 and inserting a new set of visible objects comprising the 3-D modelsof the flaming balls 504, the recognition of predefined motions,gestures, and/or positions of the frame 404 can also drive the reactionsof other digital CGI characters. In the example of FIG. 5, an observingcharacter 508 has been added to the 3-D environment 502. The observingcharacter 508 can be placed by default into the 3-D environment 502 atthe beginning of the performance, or may be generated in response to therecognition of a predefined motion/gesture/position of the frame 404from the motion-capture performer 308. For example, when the frame 404walks into a center area of the 3-D environment 502, the game engine cancause the 3-D model of the observing character 508 to walk into the 3-Denvironment 502 using a predefined animation sequence.

Once the 3-D model of the observing character 508 has been added to the3-D environment 502, the motion/gesture/position of the frame 404 canalso trigger reactions from the observing character 508. For example,after performing the juggling motion for a predefined time interval, thegame engine can cause the 3-D model of the observing character 508 totransition to an animation sequence that causes the observing character508 to applaud the performance of the clown 506 enthusiastically.Similarly, when the game engine recognizes that the juggling gestureperformed by the frame 404 has stopped, the game engine can cause the3-D model of the observing character 508 to transition back to ananimation sequence where he stands still. In addition to providing arealistic, unscripted, real-time performance between the human character108 and the 3-D model of the clown 506, this also allows the game engineto insert additional virtual characters into the performance such thatthe human character 108, the 3-D model of the clown 506, and the 3-Dmodel of the observing character 508 are all able to interact with eachother with the scene being driven by the movements of the motion-captureperformer 308.

At this point, elements of the 3-D scene 502 may have been generated ina virtual environment, but they still remain separate from the physicalenvironment of the first performance area 102. In order to integratethese two environments together for viewers of the first performancearea 102, the immersive content presentation system can first combinethe two environments in a virtual form, then generate a rendered 2-Dimage that can be composited in real time with the image from the camera106 of the first performance area 102.

FIG. 6A illustrates a view of the 3-D environment 502 that incorporateselements based on the first performance area 102, according to someembodiments. FIG. 6A illustrates the same 3-D model of the clown 506,3-D models of the flaming balls 504, and 3-D model of the observingcharacter 508 that were present in the 3-D environment 502 of FIG. 5.However, FIG. 6A adds additional virtual objects that model thereal-world objects from the first performance area 102. Specifically,objects created as volumetric models of the elements of the firstperformance area 102 have been added to the 3-D environment 502. Forexample, the object 220 representing the volumetric model of the table110 has been added in front of the 3-D model of the clown 506. Theobject 222 representing the volumetric model of the chair 112 has beenadded behind the 3-D model of the observing character 508. The object228 representing the volumetric model of the human character 108 hasbeen added behind the object 220.

In this example, the physical areas in the first performance area 102and the second performance area 302 have been combined into a singlevirtual area in the 3-D environment 502. Thus, the relative locations ofthe objects in each of the performance areas 102, 302 have beenmaintained in the 3-D environment 502. A single coordinate system may beused to place elements from each scene relative to each other. Forexample, if the center coordinates of the first performance area 102centered on the table 110, then the corresponding center coordinates ofthe second performance area 302 would be centered in front of themotion-capture performer 308 where the table should be. When combiningelements from both of the performance areas 102, 302, this would placethe object 220 representing the volumetric model of the table 110 infront of the 3-D model of the clown 506. In some embodiments, the secondperformance area 302 can be marked with cutouts, props, floor tape,holograms, or other representations of physical objects in the firstperformance area 102. These can provide visual cues to themotion-capture performer 308 such that they can act/move around theobject that would be present in the first performance area 102. Thisprovides a more lifelike and realistic performance when the elements ofboth scenes are combined for the audience. Additional details formaintaining a consistent coordinate system between different performanceareas is described in greater detail below.

This example uses the objects representing volumetric models of elementsof the first performance area 102 as captured by the plurality of depthcameras 104. This is possible because these objects as captured by thedepth cameras 104 are not actually rendered from the 3-D environment502. Instead, they are placed in the 3-D environment 502 so that whenimages of the 3-D model of the clown 506 and the 3-D model of theobserving character 508 are rendered and composited onto an image of thefirst performance area 102, the images of these characters can be seenmoving seamlessly in front of, behind, and in-between the objects in thefirst performance area 102. Therefore, an approximation of the volume ofobjects in the first performance area 102 can be used without requiringtextures, colors, patterns, or other surface details that would notnecessarily be captured by the depth cameras 104. The volumes alone maybe sufficient to create cutouts in the rendered images of the virtualcharacters 506, 508 as described further below.

FIG. 6B illustrates a view of the 3-D environment 502 that incorporateselements based on the first performance area 102, according to someembodiments. FIG. 6B illustrates the same 3-D model of the clown 506,3-D models of the flaming balls 504, and 3-D model of the observingcharacter 508 that were present in the 3-D environment 502 of FIG. 5.However, in contrast to FIG. 6A, FIG. 6B adds the 3-D models 210, 212that were selected in FIG. 2B. Specifically, the 3-D model of the table210 and the 3-D model of the chair 212 have been placed into the 3-Denvironment 502 in their corresponding locations from the firstperformance area 102. As described above, these 3-D models 210, 212 canbe selected by human inputs, such as an administrator designing the 3-Denvironment to load and place virtual assets corresponding to objectsand locations from the first performance area 102. Alternatively, these3-D models 210, 212 may be automatically selected by matching virtualassets from a content library or virtual asset data store that matchwithin a predetermined threshold the volumetric model created by thedepth cameras 104.

In this example, the object 228 representing the volumetric model of thehuman character 108 has not been included in the 3-D environment 502.Some embodiments may omit the model of the human character 108 becausethe human character 108 is dynamic and can move throughout the firstperformance area 102 in an unscripted manner. Alternatively, someembodiments can include a digital character model of the human character108 in the 3-D environment 502 as described above. Some embodiments mayalso use the object 228 representing the volumetric depth model of thehuman character 108 from FIG. 6A. Thus, objects from volumetric depthmodels may be mixed and matched with models loaded from a contentlibrary in the 3-D environment 502 in any combination and withoutlimitation. For example, objects that may be considered static, such asthe table 110 may be represented by models loaded from a contentlibrary, while objects that may be considered dynamic in the scene, suchas the human character 108, may be generated in real time in the 3-Denvironment 502 based on the real-time generation of the volumetricdepth model from the depth cameras 104.

FIG. 7 illustrates a rendered 2-D image of the 3-D environment 502before it is composited for display, according to some embodiments.Instead of simply rendering a view of the complete 3-D environment 502for display to the audience, the embodiments herein combine some ofthese rendered elements with a real-time view of the first performancearea 102 to provide an immersive mixed-reality experience. Recall thatsome objects in 3-D models were added to the 3-D environment 502 torepresent real-world objects from the first performance area 102.However, because these objects already exist in images captured of thefirst performance area 102, they do not need to be rendered andcomposited for display to the audience, which would create duplicate,overlapping views of the same objects. Instead, the actual images ofthese objects can be included in the composited image to provide a morerealistic and lifelike experience.

These objects and 3-D models may have been added to the 3-D environment502 in order to create cutouts in the digital characters that are addedto the scene by virtue of the performance of the motion-captureperformer 308. For example, FIG. 6B shows the 3-D model of the clown 506standing behind the 3-D model of the table 210. If the 3-D model of theclown 506 were simply rendered without the 3-D model of the table 210,then the entire clown would be visible in the rendered image.Consequently, when this image is composited with a real-world image ofthe first performance area 102, the clown would appear to be in front ofthe table 110.

In some embodiments, the 3-D environment 502 can be rendered and pixelsin the 2-D image 702 corresponding to objects and 3-D models added tothe 3-D environment 502 that are based on real-world objects from thefirst performance area 102 can be removed from the rendered 2-D image702. In the example of FIG. 7, pixels in the 2-D image 702 have beenremoved that correspond to the 3-D model of the table 210 and the 3-Dmodel of the chair 212. Because the 3-D model of the chair 212 wasbehind the observing character 508 in the 3-D environment 502, the imageof the observing character 708 has not been affected. The pixelsrendered from the 3-D model of the chair 212 that show through frombehind the image of the observing character 708 can be removed andreplaced with transparent values. Alternatively, instructions can beprovided to the rendering process of the game engine to not render the3-D model of the chair 212 into the resulting 2-D image 702. Generally,pixels that are not rendered, background pixels, and pixels that areremoved from the 2-D image 702 can be replaced with pixels havingtransparent values such that the 2-D image 702 can be composited withanother background image.

Removing the 3-D model of the table 210 has a different effect thanremoving the 3-D model of the chair 212. Because the 3-D model of theclown 506 is behind the 3-D model of the table 210 from the view of thevirtual camera rendering the 2-D image 702, removing the pixelsresulting from the 3-D model of the table 210 will leave a cutout 710 inthe 2-D image of the clown 706. The cutout 710 corresponds to theportions of the image of the clown 706 that would not be visible behindthe 3-D model of the table 210 relative to the virtual camera of therendering operation. As will be shown below, creating the cutout 710allows the image of the clown 706 to appear behind an image of the tablewhen the 2-D image 702 is composited with a real-world image of thefirst performance area 102.

The rendering operation that generates the 2-D image 702 can be capturedfrom the virtual camera location corresponding to the location of thecamera 106 in the first performance area 102. For example, image 702 canbe rendered from a perspective of a camera positioned in a similarlocation in the first performance area 102. This may be true forcompositing operations where the same composited image is displayed toall of the audience members, such as on a large display screen above orbeside the first performance area 102. In other embodiments, multipledevices may be used to view the composited image. For example, audiencemembers may each be wearing an individual pair of AR glasses, oraudience members may be viewing the first performance area 102 throughthe camera display screen of mobile computing devices such as asmartphone or tablet computer. In these implementations, the renderingoperation that generates the 2-D image 702 may take place once for eachindividual display device, and these rendering operations may becaptured from virtual cameras in the 3-D environment 502 correspondingto the individual locations of the display devices. For example, foraudience members wearing AR glasses, the rendering operation maygenerate a unique 2-D image 702 for each individual device that arebased on their location relative to the first performance area 102. Asdescribed below, these display devices can provide location informationto the rendering operation such that the rendering operation candetermine the location of each virtual camera. Thus, each version of therendered 2-D image 702 may appear different based on the location of theaudience member. For example, as an audience member removes towards theleft side of the first performance area 102, the cutout 710 will shiftto the right in the 2-D image 702 and cover more of the image of theclown 706. This rendering operation can be distributed to a mobiledevice, or may be performed centrally at a single server or plurality ofservers having sufficient processing power to render multiple images ofthe 3-D environment 502 simultaneously such that real-time videosequences can be generated for multiple viewing devices.

FIG. 8 illustrates a view of a composited view 802 that combines animage of the first performance area 102 with the rendered 2-D image 702of the 3-D environment 502, according to some embodiments. Additionalcomposited views according to different embodiments will discussed insubsequent figures. In this embodiment, the image of the firstperformance area 102 may include a real-time 2-D image captured from thecamera 106. For example, the camera 106 may capture a real-time 2-Dimage of the character 808, as well as an image of the table 810, animage of the chair 812, along with other visible components of the firstperformance area 102.

The capture of the image of the first performance area 102 and thedisplay of the composited view 802 may occur in real-time. As usedherein, the term “real-time” refers specifically to an interactive framerate, or a frame rate at which there is not a noticeable, significantlag between what a viewer in the audience would observe in the firstperformance area 102 using their naked eye and the display of thecomposited view 802. In some cases, real-time implies less than a1-second delay between capturing the image of the first performance area102 and the display of the corresponding composited view 802.

Additionally, the examples described herein refer largely to a singlecomposited view, a single 2-D image, a single 2-D rendering of the 3-Denvironment, and so forth. However, each of these views/images may bepart of continuous real-time video streams. Thus, any reference to asingle view/image should be understood to be applicable to a sequence ofviews/images that are presented as a real-time video stream comprised ofa plurality of views/images. By applying the techniques and methodsdescribed herein to each view/image/frame in a real-time video stream orsequence, an audience member is able to experience an immersiveexperience where real-world and virtual-world elements are combined in asingle, continuous, live presentation. Similarly, any reference to acamera refers to a camera capable of capturing single images and/orcapturing continuous video streams.

As described above, the composited view 802 may use images in the videostream of images captured by the camera 106 as one layer in compositingoperation. Additionally, the immersive content presentation system mayuse the rendered 2-D image 702 that includes the image of the clown 706,the image of the observing character 708, and the image of the flamingballs 704 as a second layer in the compositing operation. Because thelocation of the virtual camera used to render the 2-D image 702corresponds to a location in the first performance area 102 of thecamera 106, these two layers can be composited together to form thesingle composited view 802 of the complete scene.

As described above, the rendered 2-D image 702 can apply cutouts torendered virtual assets that are to be inserted into the composited view802. For example, the image of the clown 706 has a cutout 710corresponding to the image of the table 810 from the view of the camera106. Thus, when the two layers are composited together, the cutout 710causes the image of the clown 706 to appear to be behind the image ofthe table 810. Again, this cutout/rendering of the 3-D environment 502can be executed for each image frame in a video stream that includes thecomposited view 802. For example, if the motion-capture performer 308begins to move around the second performance area 302 in a pattern thatwould walk around the location of the table 110 in the first performancearea 102, the cutout 710 in the image of the clown 706 would be adjustedin each frame such that the image of the clown 706 would appear to walkaround the image of the table 810 in the composited view 802. Thiscutout/rendering operation in a video stream makes the image of theclown 706 appear to be an integral part of the overall composited image802 such that it is indistinguishable to a viewer that the image of theclown 706 is not a part of the real-world environment of the firstperformance area 102.

The composited view 802 of FIG. 8 can be displayed on any suitabledisplay device. In some embodiments, the composited view 802 can bedisplayed on a large display screen that is near the first performancearea 102. This allows the audience to watch the performance of a humancharacter 108 in the first performance area 102 and simultaneously seethe resulting composited view 802 on the large display screen. Forexample, the large display screen may be an image projected on the wall,a large digital display in a concert venue or sports arena, a televisionscreen, a “Jumbotron,” or any other display device. In some embodiments,the composited view 802 can be displayed on a screen that is part of thefirst performance area 102. For example, one of the walls in the firstperformance area 102 may be implemented using a display screen or wallprojection that displays the composited view 802 behind the performanceof the human character 108.

The composited view 802 may also be displayed on a pair of VR gogglesworn by audience members. In some embodiments, the composited view 802may be the same for each audience member regardless of their position inthe audience surrounding the first performance area 102. In someembodiments, the composited view 802 may be different for each audiencemember and may depend at least in part on their position relative to thefirst performance area 102. Operations for rendering specific views ofthe 3-D environment 502 based on a position of each audience member andare described in greater detail below relative to FIG. 9 and FIG. 10.

FIG. 9 illustrates a display of a composited view 902 on a pair of ARglasses 904, according to some embodiments. In comparison to the displayof the composited view 802 in FIG. 8, this composited view 902 includesa layer comprising the natural view of the first performance area 102that is visible through and around the AR glasses 904. An audiencemember wearing the AR glasses 904 would be able to see this natural viewof the first performance area 102 using their naked eye. Additionally,the second layer of the compositing operation includes the rendered 2-Dimage 702 that includes the image of the clown 706, the image of theobserving character 708, and the images of the flaming balls 704. Thiscompositing layer is displayed by the AR glasses 904. Therefore, inthese embodiments, the “compositing” operation includes displaying therendered 2-D image 702 on the AR glasses 904 such that elements of the2-D image 702 are overlaid on the natural view of the first performancearea 102 when viewed through the AR glasses 904.

The immersive content presentation system may include additional devicesfor audiences who view the first performance area 102 through multiplepairs of AR glasses 904. In some embodiments, audience members may beequipped with individual pairs of AR glasses 904, and may each view thefirst performance area 102 from a unique position surrounding the firstperformance area 102. Therefore, one or more central computing devicesor servers may be in wired or wireless communication with each of thepairs of AR glasses 904.

In some embodiments, the immersive content presentation system candetermine the location and/or orientation of each pair of AR glasses 904individually. This can be done using RF devices that are placed on eachside of the AR glasses 904. The position of each pair of AR glasses 904can then be triangulated using a plurality of RF transmitters/receiverspositioned around the audience of the first performance area 102. Thelocation of the AR glasses 904 can also be determined by placing visualfiducials on the glasses that can be visually tracked by motion-capturecameras. Additional motion-capture cameras can then be placed around theaudience to determine the locations of the AR glasses 904 in real time.These embodiments allow users to bring their own AR glasses 904 andapply the RF tags, visual fiducials, and so forth, when entering theaudience area.

In some embodiments, the immersive content presentation system canreceive a location that is determined by the AR glasses 904 themselves.For example, the AR glasses 904 may be equipped with accelerometers,motion sensors, gyroscopes, gravitational sensors, digital compasses,and other location/orientation determination devices. Based on acalibrated starting point that can be determined and transmitted at thebeginning of the performance, each pair of AR glasses 904 can tracktheir position/orientation relative to the starting point throughout theperformance. This position/orientation data can then be transmitted inreal-time to the server of the immersive content presentation system.

Because each pair of AR glasses 904 in the audience may be positioned ata unique location, the composited view 902 may look different througheach pair of AR glasses 904. Therefore, the immersive contentpresentation system can generate rendered 2-D images 702 that are basedon the unique position of each of the AR glasses 904. For example, the3-D environment 502 may be the same for each audience member. However,the position of the virtual camera that is used to render the 2-D image702 from the 3-D environment 502 may change for each audience member.Specifically, the position of the virtual camera in the 3-D environment502 may be positioned in the 3-D environment 502 for each audiencemember in a location that corresponds to the location of their pair ofAR glasses 904 in the audience area of the first performance area 102.An audience of 25 people would thus generate 25 versions of the 2-Dimage 702, each being rendered from a different perspectivecorresponding to the locations of the AR glasses 904 of each audiencemember.

As described below, a specialized protocol has been developed forhandling the real-time broadcasts of camera images, rendered images, an3-D environment information to accommodate various hardware and softwareconfigurations that may be present in different embodiments of theimmersive content presentation system. In some embodiments, renderingimages of the 3-D environment 502 for each pair of AR glasses 904 togenerate the unique 2-D images 702 may be performed at a central server.As described below, this central server may include a game engine. Thecentral server may have significant processing power that allows theimmersive content presentation system to generate a large number ofrendered images of the 3-D environment 502 simultaneously for real-timebroadcast to multiple display devices. This also allows the AR glasses904 to be relatively lightweight in hardware and software processingpower and simply receive rendered images from the central server.

Additionally or alternatively, some embodiments may allow the renderingoperation to take place at each mobile device carried by the audiencemembers. For example, some commercially available AR glasses 904 mayhave sufficient processing power onboard to perform real-time renderingoperations of the 3-D environment 502. Instead of transmitting renderedimages, the central server can instead transmit real-time transformsthat are applied to the 3-D environment 502. For example, the 3-Denvironment 502 may be represented in a virtual 3-D scene that is storedon each mobile device when the motion-capture performer 308 movesthroughout the second performance area 302, the immersive contentpresentation system can send transforms that should be applied to themotion-capture frame 404 representing the motion-capture performer 308.The individual mobile devices of each audience member can receive thesetransforms and apply them in real-time to the 3-D model of the clown210, the 3-D model of the observing character 508, etc. Each mobiledevice can then perform its own render of the 3-D environment 502 fromits own perspective and display the rendered 2-D images 702 (with anyappropriate cutouts 710) on each individual pair of AR glasses 904.

The various embodiments illustrated by FIG. 9 use AR glasses 904 as anexample, but this is not meant to be limiting. Other embodiments thatmay use a similar technology include holograms that are projected intothe first performance area 102, images that are projected onto atransparent screen that surrounds the first performance area 102, and/orany other technology that may composite a rendered 2-D image 702 onto areal-time natural view of the first performance area 102.

FIG. 10 illustrates an embodiment that displays a composited view 1002on a mobile device 1004 equipped with a camera and a display screen,according to some embodiments. The mobile device 1004 may include anyelectronic device with a camera that can display real-time video streamson a corresponding display screen, such as a smart phone, a digitalmusic player, a PDA, a tablet computer, a notebook computer, a laptopcomputer, and so forth. The mobile device 104 may be held by eachaudience member or otherwise positioned in front of each audience membersuch that the camera of the mobile device 1004 can capture a view of thefirst performance area 102 and display that view of the firstperformance area 102 on the display screen of the mobile device 1004such that it appears to the audience member that they are viewing thefirst performance area 102 through a “window” of the mobile device 104.For example, a view of the human character 108 will look the same to theuser when viewing the human character 108 through the mobile device 1004and when using their naked eye when the mobile device 1004 is moved outof the line of sight between the audience member and the human character108.

As described in detail above, the immersive content presentation systemmay also receive a location and/or orientation of each mobile device1004 for each audience member. As was the case with the AR glasses 904,the view through each mobile device 1004 may be unique and based atleast in part on the location and/or orientation of each individualmobile device 1004. Therefore, each mobile device 1004 may be equippedwith the internal location determination circuitry described above(e.g., accelerometers, gyroscopes, gravitational sensors, compasses,etc.) and the location and/or orientation can be transmitted from eachmobile device 1004 to a central server in real-time. Alternatively oradditionally, each mobile device 1004 may be equipped with devices thatallow the immersive content presentation system itself to determinetheir location and/or orientation (e.g., RF tags, visual fiducials,etc.).

Using this unique location/orientation information for each mobiledevice 1004, the immersive content presentation system can render imagesof the 3-D environment 502 that correspond to the locations of eachmobile device 1004 relative to the first performance area 102. Asdescribed above, a virtual camera that is used to render the 3-Denvironment 502 for each mobile device 1004 can be positioned in the 3-Denvironment 502 corresponding to the real-world position of the mobiledevice 1004 in the audience. Alternatively or additionally, the centralserver can use the communication protocol described below to streamtransforms that should be applied to virtual assets in a version of the3-D environment 502 that is stored locally at each of the mobile devices1004. Each of the mobile devices 1004 can then perform their ownreal-time render of the locally stored 3-D environment 502 to generatetheir own unique 2-D image 702 of the 3-D environment 502.

The compositing operation can be carried out by each individual mobiledevice 1004. The first layer of the compositing operation can includeimages from the real-time video captured by the camera of the mobiledevice 1004. The second layer of the compositing operation can includethe elements of the rendered 2-D image 702 for that mobile device 1004.When composited together, the composited view may be displayed on thedisplay screen of the mobile device 1004. For example, when viewing thefirst performance area 102 with their naked eye, none of the elements ofthe 3-D environment 502 (e.g., the clown, the observing character, theflaming balls, etc.) would be visible. However, when the audience membermoves their mobile device 1004 in front of their eye to view the firstperformance area 102 through the mobile device 1004, the elements of the3-D environment 502 may be displayed as though they were a natural partof the real-world first performance area 102.

FIG. 11 illustrates the first performance area 102 combined with thesecond performance area 302 in the same physical space, according tosome embodiments. Instead of having separate performance areas, one forthe live scene elements (e.g., the human character 108) and one for themotion-capture performer, these embodiments place the motion-captureperformer 308 in the first performance area 102. Thus, for purposes ofthis disclosure, certain embodiments may use the terms “firstperformance area” and “second performance area” or “first real-worldenvironment” and “second real-world environment” to refer to the samephysical space in the real world.

Advantages of these embodiments include the fact that the motion-captureperformer 308 can navigate around the first performance area 102 withoutusing simulated props or other means to simulate movement around/onobjects in the first performance area 102 while physically separatedfrom the actual objects. For example, the motion-capture performer 308can walk around the table 110 without being required to use a propsimulating the size and geometry of the table 110 or guessing thelocation of the table in a physically separate space.

In these embodiments, the first performance area 102 can still includethe depth cameras 104 and the visible-light camera 106. These devicescan perform the same functions described above, such as recording areal-time video stream of the first performance area 102 and capturingdepth images used to construct volumetric objects or 3-D models in thecorresponding 3-D environment 502. Additionally, some embodiments mayuse the depth images to generate a volumetric object representing themotion-capture performer 308. This volumetric object can be used toestimate a vertex/wireframe representation of the motion-captureperformer 308 that is similar to the vertex/wireframe representationgenerated by the motion-capture system described above. This allows theimmersive content presentation system to instead use data from the depthcameras 104 to determine the motion of the motion-capture performer 308.The resulting motion/position of the frame derived from the depth imagescan be fed into the game engine in the same way that the frame derivedfrom the motion-capture system is fed into the game engine togenerate/control virtual assets that are added to the 3-D environment502. Alternatively or additionally, the first performance area 102 mayinclude a plurality of motion-capture cameras 304 as depicted in thesecond performance area 302 of FIG. 3 above. Although not shown in FIG.11, these motion-capture cameras 304 can be distributed throughout thefirst performance area 102 in a similar fashion.

In these embodiments, the 3-D environment 502 can be generated using themethods described above to include volumetric objects (e.g., 220, 222,228), 3-D models (e.g., 210, 212), and virtual assets such as the 3-Dmodel of the clown 506, the 3-D model of the observing character 508,and so forth. A rendered 2-D image 702 can be generated based on asingle location of the camera 106 and/or based on multiple locations ofindividual AR glasses 904 and/or mobile devices 1004 in the audience.

One difference in these embodiments is that when viewing the firstperformance area 102 with the naked eye, audience members would see themotion-capture performer 308 in the first performance area 102. While inprevious embodiments the motion-capture performer 308 may have been seenoff to the side in the second performance area 302, these embodimentsinsert the motion-capture performer 308 into the same scene as the humancharacter 108. This may allow for more lifelike interactions between adigital character controlled by the motion-capture performer 302 and thehuman character 108 and/or other objects in the first performanceenvironment 102 (e.g., the table 110).

FIG. 12 illustrates a composited view 1202 of the first performance area102 as displayed on a display screen. It should be noted that thecomposited view 802 generated when the motion-capture performer 308 isin a separate physical space is nearly identical to the composite view1202 generated when the motion-capture performer 308 is in the samephysical space as the first performance area 102. A difference in theprocess, however, is that the first layer of the compositing operationthat includes the real-time, visible-light view of the first performancearea 102 would include the motion-capture performer 308. Therefore, thecomposited view 1202 covers the view of the motion-capture performer 308with the image of the clown 706. For example, as the arms of themotion-capture performer 308 move in the juggling fashion, the arms ofthe image of the clown 706 would move in a similar fashion to visuallycover the moving arms of the motion-capture performer 308 in thecomposite view 1202. In another example, if the motion-capture performer308 walks around the table 110, the image of the clown 706 would alsowalk around the image of the table 810 such that the view of themotion-capture performer 308 may remain completely covered by the imageof the clown 706 in the composite view 1202.

These embodiments that cover a view of the motion-capture performer 308with a rendered view of a digital character allow the immersive contentpresentation system to “re-skin” live characters to appear different tothe audience through the composite view 1202. Although FIG. 12 uses ahuman CGI character that is similar in size and build to themotion-capture performer 308, other embodiments are not so limited. Forexample, some embodiments may re-skin the motion-capture performer 308with a robot character, a humanoid character, an animal character (e.g.,a dinosaur), a vehicle, a piece of scenery, an object, or any othervirtual asset.

Additionally some embodiments can re-skin other objects in the firstperformance area 102. For example, the depths cameras 104 and/or themotion-capture cameras 304 can visually recognize configurations ofprops in the first performance area 102 and replace them with 3-D modelsof objects in the 3-D environment 502. For example, the depth cameras104 may generate a volumetric object for the table 110 that isrecognized as such by the immersive content presentation system. Thetable 110 may be recognized by comparing it to a predefined volume forthe table 110 that is previously provided to the immersive contentpresentation system. Alternatively or additionally, the table 110 may bevisually recognized as such by the motion-capture cameras 304. In eithercase, the immersive content presentation system can re-skin the table110 by inserting a new virtual asset into the 3-D environment 502 thatwould cover the view of the table 110 in the composite view 1202. Forexample, the table 110 can be replaced in the 3-D environment 502 with a3-D model of a circus platform. When the motion-capture performer 308climbs on top of the table 110, it will appear in the composite view1202 that the image of the clown 706 climbs on top of an image of thecircus platform. These operations can visually enhance the firstperformance area 102 with CGI props and environments that may not befeasible or practical in the real world.

FIG. 13 illustrates a flowchart 1300 of a method for generating animmersive experience that mixes real-world and virtual-world content,according to some embodiments. The method may include receiving areal-time motion or position of a performer in a first real-worldenvironment (1302). The performer may include a motion-capture performerwhose movements/position are captured by a motion-capture system. Themotion or position of the performer may also be captured by a pluralityof depth cameras. The first real-world environment may include thesecond performance 302 area described above.

The method may also include identifying the motion or position as apredefined motion or position (1304). Some embodiments may encode themotion or position as a 3-D frame comprising vertices and/or wireframerepresentations of the performer. The 3-D frame may be provided to agame engine that has been altered to include code that recognizesmotions of the 3-D frame and/or positions of the 3-D frame by comparisonto a predefined set of motions and/or positions. The identified motionand/or position may include hand gestures, body movements, dance moves,fighting actions, stunts, acrobatics, and so forth. The identifiedmotion and/or position may also include more subtle movements, such asfinger gestures, head positions and/or rotations, foot movements, or anyother position/motion that may be represented by the 3-D frame and/orcaptured for the performer.

The method may additionally include adding or altering a virtual assetin a 3-D virtual environment in response to identifying the motion orposition as the predefined motion or position (1306). The 3-D virtualenvironment may model a second real-world environment, such as the firstperformance area 102 described above. The 3-D virtual environment mayalso comprise the 3-D environment 502 described above. The 3-D virtualenvironment may include volumetric objects and/or 3-D models of objectsthat are found in the second real-world environment, such as physicalobjects, human characters, props, scenery, and/or any other physicalcharacteristics of the second real-world environment. The virtual assetmay include one or more 3-D models of digital characters, digitalobjects, digital scenery, and other objects that can be added and/ormoved in the 3-D virtual environment. The virtual asset may also includedigital effects, such as flames, ice, weather effects, smoke, fog,explosions, lightning, or any other digital effect. In some embodiments,the virtual asset may include a plurality of virtual assets of differenttypes.

The method may further include rendering a 2-D video stream of thevirtual asset in the 3-D virtual environment (1308). In someembodiments, elements from the second real-world environment are removedsuch that they are not visible in the 2-D video stream. The 2-D videostream may be comprised of individual rendered 2-D images of the 3-Dvirtual environment that include views of the virtual asset. Images ofthe virtual asset in the 2-D video stream may include cutoutscorresponding to objects that would be visible in a corresponding viewof the second real-world environment.

The method may also include causing a real-time video stream to bedisplayed on the display device (1310). The real-time video stream mayinclude the 2-D video stream of the virtual asset. The real-time videostream may also be composited with a real-time view of the secondreal-world environment. For example, the 2-D video stream can becomposited on a real-time view of the second real-world environment thatcomprises a video stream captured by a camera of the real-worldenvironment. Additionally, the 2-D video stream can be composited on areal-time view of the second real-world environment that is viewedthrough a pair of AR glasses such that the real-time view of the secondreal-world environment comprises the natural view of the secondreal-world environment as viewed by the eye of an audience memberwearing the AR glasses.

It should be appreciated that the specific steps illustrated in FIG. 13provide particular methods of generating an immersive experience that isreactive to the motion and/or position of a performer according tovarious embodiments. Other sequences of steps may also be performedaccording to alternative embodiments. For example, alternativeembodiments of the present invention may perform the steps outlinedabove in a different order. Moreover, the individual steps illustratedin FIG. 13 may include multiple sub-steps that may be performed invarious sequences as appropriate to the individual step. Furthermore,additional steps may be added or removed depending on the particularapplications. One of ordinary skill in the art would recognize manyvariations, modifications, and alternatives.

FIG. 14A illustrates a block diagram 1400 of an immersive contentpresentation system, according to some embodiments. The system mayinclude one or more image capture cameras 1408 that capture real-timevideo images of the first performance area 102 as described above, suchas camera 106. The system may also include a depth capture system 1406,which may include the depth cameras 104. The system may also include amotion-capture system 1404 that includes the motion-capture cameras 304.

The output of the motion-capture system 404 may be provided to a centralserver 1430. The central server 1430 may include a game engine 1412 thathas been modified to accept inputs from a motion-capture system 1404.Specifically, the motion-capture frame that is provided by themotion-capture system 404, comprising one or more vertices correspondingto visual fiducials captured by the motion-capture system 1404, can beprovided to the game engine 1412. In some embodiments, the alterationsmade to the game engine 1412 may be contained in a plug-in 1410, howeverthis is not necessary. Other embodiments may provide these modificationsto the game engine 1412 using other methods, such as altering the gameengine code itself, generating a custom game engine, and/or the like.

The game engine 1412 may also accept virtual assets that are retrievedfrom a content repository 1460. The content repository 1416 can bepopulated by a content creation system 1402. In some embodiments, thecontent creation system 1402 may be integrated with the game engine 1412such that users can modify a virtual environment or game environmentusing tools provided by the game engine 1412. The content repository mayinclude various virtual assets, such as effects, asset animations, 3-Dmodels, and so forth.

The game engine 1412 can be configured to compare positions and/ormotion sequences of the 3-D frame provided from the motion-capturesystem 1404 to a predefined library of positions and/or motionsequences. Each of the entries in the predefined library of positionsand/or motion sequences can also include a corresponding virtual assetthat should be added to a 3-D virtual scene or an action that should betaken relative to an existing virtual asset in a 3-D virtual scene. Forexample, one of the actions that may be taken is to alter/blend theanimation state for an AI-driven digital character.

The central server 1430 may also include a representation of a 3-Dvirtual scene 1414. Some embodiments may store and manipulate the 3-Dvirtual scene 1414 as part of, or using, the game engine 1412. Someembodiments may receive transforms provided from the game engine tomanipulate virtual assets in the 3-D virtual scene 1414. For example,the game engine 1412 may recognize motions and/or positions from themotion-capture system 1404 and generate transforms or add virtual assetsthat may be applied to the 3-D virtual scene 1414. As described above,the depth capture system 1406 may also generate additional digital itemsthat may be added to the 3-D virtual scene 1414, such as volumetricobjects and/or 3-D models of objects corresponding to real-world objectsin the first performance area 102.

In some embodiments, the central server 1430 may also include arendering engine 1418 that can generate one or more to the images of the3-D virtual scene 1414 from various locations. The various locations maycorrespond to real-world locations/orientations of display devices usedby audience members viewing the first performance area 102. Note that insome embodiments, the rendering engine 1418 may be additionally oralternatively moved from the central server 1430 to one or more of theindividual display devices or mobile devices used by audience members.

In some embodiments, the central server may also include a compositingengine 1420. Note that some embodiments may move the compositing engine1420 from the central server 1430 to one or more individualmobile/display devices used by the audience members. The compositingengine 1420 may composite the rendered 2-D images from the renderingengine 1418 on top of images captured from the image capture cameras1408. These images may be composited in real time as part of a livevideo stream. The output of the compositing engine 1420 can be displayedon one or more display devices 1422, such as a display screen that isvisible to audience members.

FIG. 14B illustrates a block diagram 1424 of an alternate arrangement ofthe elements of the immersive content presentation system, according tosome embodiments. This embodiment is similar to arrangement in blockdiagram 1400, except the system has been customized for use of mobiledevices that include their own image capture cameras 1408, such as smartphones or tablet computers. In this arrangement, position sensors on themobile devices 1426 can provide location and/or orientation informationto the rendering engine 1418 such that the rendering engine 1418 cangenerate a plurality of 2-D images of the 3-D virtual scene 1414corresponding to the locations and/or orientations of the mobile devices1426.

The rendering engine 1418 can then provide the rendered 2-D images to acompositing engine 1420 that operates on the mobile devices 1426. Therendered 2-D images can be composited with real-time images captured byimage capture cameras 1408 that are part of the mobile devices 1426,such as cameras on a smart phone. The resulting composited view can bedisplayed on the display device 1422, such as a screen of the mobiledevices 1426. Note that in some embodiments, the position sensors can beremoved from the mobile device 1426 and made part of the firstperformance area 102 as described above. Additionally, some embodimentsmay move the rendering engine 1418 to the mobile device 1426 such thatthe central server transmits transforms that may be applied to the 3-Dvirtual scene 1414 by the rendering engine 1418 at the mobile device1426.

FIG. 14C illustrates a block diagram 1432 of an alternate arrangement ofthe elements of the immersive content presentation system, according tosome embodiments. This embodiment is similar to the arrangement in blockdiagram 1400, except the system has been customized for use of ARdevices 1428, such as AR glasses. As described above, the relativeposition and/or location of the AR devices 1428 may be determined by theAR devices 1428 themselves and/or by other devices present near thefirst performance area 102. The location and/or orientation of each ofthe AR devices 1428 can be provided to the rendering engine 1418 togenerate rendered 2-D images of the 3-D virtual scene 1414 that arebased at least in part on the location and/or orientation of each of theindividual AR devices 1428. The rendered images may then be displayed ona display device 1422 of the AR devices 1428, for example, by beingprojected on a lens on a pair of AR glasses such that the renderedimages are composited with a natural view of the first performance area102 as described above.

One of the innovations of the immersive content presentation system isthe ability for performers to drive the actions and movements of theother characters in various 3-D environments. Traditional game enginetechnology uses AI-driven character locomotion. Specifically, fornon-player characters, the game engine will employ one or more statemachines that can be used to blend between different animation statesfor any CGI character in the game environment. A series of animationscan be combined together in an animation state graph, and transitions toblend between different animation states in the state graph have beentraditionally triggered by controller inputs from the user.

In one example, a first animation state for a non-player CGI charactermay include an animation of the character standing still, breathing inand out slowly. A second animation state for the CGI character mayinclude the character walking forward at an average speed, swingingtheir arms by their side. A third animation state for the CGI charactermay include the character running forward at a rapid pace. Each of theseanimation states may represent a single state in the animation stategraph governing the behavior of the CGI character. To transition betweenanimation states in the state graph, the game engine may receive aninput provided by a user control device, such as a joystick or gamecontroller. Continuing with this example, the user can cause thedisplayed animation of the CGI character to transition from the firstanimation state where the CGI character stands still to the secondanimation state where the CGI character walks forward by actuating ajoystick slightly forward on a game controller. Similarly, the user cancause the displayed animation of the CGI character to transition fromthe second animation state where the CGI character walks forward to thethird animation state where the CGI character runs at a rapid pace byfurther actuating the joystick fully forward on the game controller.

Similarly, game engine embodiments may generate digital effects in thecorresponding 3-D virtual scene of the game environment as part of apre-scripted sequence of events in a scene graph for a 3-D virtualscene. For example, certain storyboard events in a scene graph for avirtual scene can cause virtual effects to appear in the scene. Forexample, after a predefined amount of time or in response to apredefined event, the scene graph may dictate that the weather eventoccurs, lightning strikes, and explosion occurs, and so forth. However,these virtual effects are not generated by an external user stimulus inmost cases.

The embodiments described herein alter this traditional functionality ofthe game engine to (1) receive movements and/or positions captured by amotion-capture system, (2) compare movements and/or positions of themotion-capture frame of a motion-capture performer to a library ofpredefined motions and/or positions, and (3) cause a transition betweenpredefined animation states in a CGI character or generate a virtualeffect in the 3-D virtual scene. As described above, this alteredfunctionality can be provided as a plug-in to the game engine. Forexample, commercially available game engines, such as the Unreal Engine®provide a programmer interface to accept plug-ins that can alter itsfunctionality. Alternatively or additionally, a custom game engine maybe designed that includes this functionality, or the code of existinggame engines can be altered to execute this functionality.

FIG. 15A illustrates an example of a motion sequence executed by amotion-capture performer that may be used to trigger actions by a gameengine, according to some embodiments. This figure illustrates twodifferent poses performed by the motion-capture performer. A first pose1502 shows the motion-capture performer standing with his arms in aready position. A second pose 1504 shows the motion-capture performerlunging forward with his arms in an outstretched position. These posesshow the frame that may be captured by the motion-capture systemcomprised of vertices that are connected to form a wireframe skeletonrepresenting the motion-capture performer in a 3-D environment referredto herein as a “motion-capture frame,” or simply “frame.” Furthermore,FIG. 15A includes a silhouette around the frame in each pose. Thissilhouette is for illustrative purposes only to provide a clear pictureof the pose of the motion-capture performer.

The first pose 1502 and the second pose 1504 may be capturedsequentially in successive images captured by the motion-capture system.Additional intermediate images may also have been captured between thefirst pose 1502 and the second pose 1504 that are not illustrated inFIG. 15A. For example, a time interval, such as 1 second, may havepassed between capturing pose 1502 and pose 1504 by the motion-capturesystem. Thus, pose 1502 may represent a starting point for a motionperformed by the motion-capture performer, and pose 1504 may representan ending point for the motion. As described above, these poses 1502,1504 can be provided as successive inputs to the game engine, and thegame engine can compare this motion between pose 1502 and pose 1504 to apredefined library of known motions and/or locations to drive variouselements of the 3-D virtual scene.

FIG. 15B illustrates a detailed view of portions of the motion-captureframe from the first pose 1502 and the second pose 1504 to demonstratehow the game engine can identify a predefined motion, according to someembodiments. The dashed lines/vertices represent a portion of themotion-capture frame from the first pose 1502, and the solidlines/vertices represent a portion of the motion-capture frame from thesecond pose 1504. These two frames are superimposed on each other inFIG. 15B to show how these vertices can move in the 3-D virtual scene tocreate an identifiable motion. These portions of the motion-captureframes correspond to the arms of the character in FIG. 15A.

In some embodiments, the game engine can recognize motions based on asingle pose or the location of vertices in a frame. For example, whenthe motion-capture performer holds their arms outstretched in front ofthem with their hands extended in opposite directions, the game enginecan compare the directions of distances between vertices to determinethat the current pose matches a predefined pose. In this example, thegame engine can determine a distance 1528 between vertex 1524 and vertex1526 of the frame and can determine a distance 1505 between vertex 1520and vertex 1522 of the frame. The distances between these vertices canindicate that the arms of the motion-capture performer are within athreshold distance of each other, and the orientation of these distancevectors (e.g., approximately perpendicular to the ground) can indicatethat the arms are outstretched in front of the motion-capture performer.Similar position/distance determinations can be made regarding thevertices representing the hands of the motion-capture performer todetermine that the hands are outstretched in opposite directions. Thisis just one example of how the distance and orientation of thecollection of vertices in the motion-capture frame can be used todetermine a current location and/or pose of the motion-captureperformer. The process used in this example can be used to identify anypose and/or any other position that can be defined by a collection ofvertices for a motion-capture frame. For example, this process can beused to identify when the motion-capture performer points a finger in acertain direction, enters a predefined area of the second performancearea 302, contacts a real-world object in either of the performanceareas 102, 302, contacts a virtual-world object in the 3-D virtualenvironment, and so forth.

In addition to identifying static poses, the game engine can alsoanalyze successive poses of the frame to identify motions performed bythe motion-capture performer. For example, by calculating distancesbetween vertices in successive poses, the game engine can calculate avelocity, acceleration, trajectory, and/or motion path for each vertex.Motion paths of vertices can then be compared to predefined motion pathsto identify predefined gestures. A gesture may be defined as one or morepredefined motion paths of vertices from a motion-capture frame over aplurality of poses over time. Motion vectors can be calculated forcorresponding vertices in the first pose 1502 and the second pose 1504.For example, FIG. 15B illustrates motion vectors 1505, 1507, 1509 as thearms of the motion-capture performer are extended away from the body ofthe motion-capture performer.

In the example described above, the motion-capture performer 308 is ableto make a juggling motion with their hands. In the 3-D environment 502,this would correspond to the vertices of the hands of the motion-captureframe 404 of the motion-capture performer 308 moving back and forth inan arc motion at a predetermined height relative to the motion-captureperformer 308. Motion vectors that correspond to this arcedback-and-forth motion can be stored as a template for the predefinedgesture by the game engine. When the motion-capture performer 308performs a similar motion, the game engine can compare the motionvectors calculated for the current movement of the motion-captureperformer 308 and compare them to the template for the predefinedgesture. If they match within a threshold amount of the template motionvectors for the gesture (e.g., if the motion vectors are within 10%,20%, 25%, or some other predetermined threshold), then the game enginecan execute one or more actions in response to the identified gesture.Some embodiments may allow the identified motion vectors to be rotatedor translated such that the motion-capture actor 308 can be positionedin any orientation relative to an origin, and the game engine may stillidentify predefined motions.

Some embodiments can use the movements of the motion-capture performer308 to train the game engine to recognize new predefined motions and/orpositions. For example, to train the game engine to recognize thejuggling motion, the motion-capture performer 308 can move their handsback and forth in an arc pattern in front of them. This gesture can berecorded by the game engine. Some embodiments may also prompt themotion-capture performer 308 to repeat the gesture a plurality of times,and the game engine can average the motion vectors from each iterationof the gesture to generate an average set of motion vectors defining thegesture, as well as a standard deviation that can be used as a thresholdwhen comparing subsequent gestures to the predefined gesture after thetraining session.

FIG. 15C illustrates an example of virtual effects that can be triggeredbased on identifying predefined motions and/or positions, according tosome embodiments. Each of the predefined motions and/or positions thatare stored by the game engine and identified when performed by amotion-capture performer can be associated with one or more actions thatcan be executed by the game engine in the 3-D virtual scene. In someembodiments, these actions may include adding virtual assets to the 3-Dvirtual scene that were not present prior to identifying themotion/position. These virtual assets may include additional CGIcharacters, scenery, props, objects, and/or other 3-D models or objects.These virtual assets may also include virtual effects that are appliedto the scene.

FIG. 15C illustrates one example of a virtual asset that may be added tothe scene as well as effects that may be applied. In this example, themotion-capture frame has been replaced or re-skinned in the 3-D virtualscene with a CGI character in a first pose 1506 and a second pose 1508corresponding to the poses 1502, 1504 of the motion-capture frame. Themotion-capture frame can drive the motion of the CGI character.Additionally, after identifying the predefined gesture depicted in FIG.15B, the game engine can insert a new digital object into the 3-Dvirtual scene. In this example, the digital object may include afireball 1510 that is emitted from the hands of the CGI character whencompleting the identified gesture/pose. The new digital object maycomprise a 3-D model that is added to the scene and is subject to thephysics engine of the game engine. For example, the fireball 1510 mayhave a virtual mass and velocity when it is initially generated from thehands of the CGI character. The trajectory of the fireball 1510 may begoverned by the physics engine like any other object in the 3-D virtualscene after it is added to the 3-D virtual scene.

In addition to generating the new fireball 1510 object, virtual effectsmay be applied to the 3-D virtual scene in response to identifying themotion/position of the motion-capture frame. For example, an explosivesound or burst of light may be added to the 3-D virtual scene when thefireball 1510 is generated. The fireball 1510 itself may generatevirtual effects, such as flames, smoke, and a burning sound. Othervirtual effects may include a flash of light, lightning, sounds, strobeeffects, and/or other effects.

FIG. 16A illustrates how a motion-capture frame 1606 can interact withexisting virtual objects in the 3-D virtual environment, according tosome embodiments. In this example, the motion-capture performer 308 canstand with their arms down to their side. In a motion-capture area, suchas the second performance area 302, the motion-capture performer 308 maybe provided with a screen that shows a real-time view of the 3-D virtualenvironment. On the screen, the motion-capture performer 308 may seevarious virtual assets in the 3-D virtual scene, such as a chair 1602.However, because the chair 1602 does not exist in the second performancearea 302, some embodiments may include props that are similar in sizethat are placed in the 3-D virtual environment—such as a real-worldchair.

FIG. 16B illustrates a position that may be recognized by the gameengine, according to some embodiments. As the motion-capture performer308 raises their left-hand and places it out to their side, the gameengine can recognize this as a predetermined position that triggers anaction by the game engine. In this case, the motion-capture frame 1606corresponding to the motion-capture performer 1308 can place its lefthand 1604 out to the side. When the vertices of the left hand 1604 arewithin a threshold distance of the virtual representation of the chair1602, the game engine can identify this as a predefined position. Oneaction that may be taken is to simulate contact between the hand 1604and the chair 1602 in the 3-D virtual environment. This may beaccomplished using virtual friction properties that are enforced by thephysics engine of the game engine. This may also be accomplished bycreating a temporary attachment between the hand 1604 and the chair1602.

FIG. 16C illustrates how subsequent motions of the motion-capture frame1606 can generate actions by the game engine in the 3-D virtualenvironment, according to some embodiments. In this example, themotion-capture performer 308 can further extend their left hand out totheir side. In the 3-D virtual environment, the correspondingmotion-capture frame 1606 can extend its left hand 1604 out to the sidein a corresponding fashion. However, because the left hand 1604 isconnected to the chair 1602 by friction or otherwise, the chair 1602 maymove away from the motion-capture frame 1606. If the motion-captureperformer 308 moves their left hand back and forth, the virtual chair1602 would also move back and forth with the left hand 1604 in the 3-Dvirtual environment. The motion of the chair 1602 can be governed by thephysics engine of the game engine in response to virtual forces appliedby the motion-capture frame 1606.

FIG. 16D illustrates how the motion-capture frame 1606 can becomeuncoupled from a virtual object, according to some embodiments. In thisexample, as the motion-capture performer 308 pushes their left hand outto their side with more than a threshold amount of force or speed, thegame engine can recognize this velocity vector of the hand 1604 as anidentified predefined vector that is sufficient to detach the left hand1604 from the virtual representation of the chair 1602. As governed bythe physics engine of the game engine, the chair 1602 can accelerateaway from the motion-capture frame 1606 by rolling way on the ground.

It should be noted that this same concept was demonstrated above withthe juggling motion of the 3-D model of the clown 506. As themotion-capture performer 308 moves their arms back and forth, the gameengine generated the 3-D models of the flaming balls 504 in response tothis identified, predefined motion. The flight path and trajectory ofthe flaming balls 504 may be governed by the physics engine, and thegame engine may recognize when the fingers of the motion-capture actor308 close around the ball and subsequently release the ball as they arejuggled to apply virtual forces to the balls.

FIG. 17A illustrates how the actions of others CGI characters in the 3-Dvirtual scene can be governed by the motion and/or position of amotion-capture actor, according to some embodiments. A motion-captureframe 1702 may correspond to a motion-capture performer 308 in thesecond performance area 302 performing a ballet dance. For example, themotion-capture performer 308 may be spinning clockwise in a circle onone foot. At the same time, the 3-D virtual environment may include twoCGI characters, or non-player characters, that follow individualanimation graphs to govern their motions as described above. A firstdancer 1704 may be animated as a CGI character having a 3-D model in the3-D virtual environment that is poseable according to a variety ofanimations. Similarly, a second dancer 1706 may also be animated in the3-D virtual environment. They may execute a current animation in theirassigned animation graphs such that they spin counterclockwise in thecircle on two feet.

FIG. 17B illustrates an identified motion of the motion-capture frame1702 that changes the state in the animation graph of each of the CGIcharacters, according to some embodiments. Generally, CGI characters maybe associated with many (e.g., hundreds) of animation sequences that canbe used to program the motion and behavior of CGI characters accordingto inputs received by the game engine. These animation sequences can bestrung together in the animation graph to govern how the CGI characterrespond to inputs to the game engine. The choice of which animationsequence to execute and the specific manner in which the game engineblends from one animation sequence to the other is determined byspecified control factors, such as gain control inputs. However, in theembodiments described herein the game engine has been modified to acceptmovements and positions by a motion-capture frame and identifypredefined motions and/or positions that trigger transitions in theanimation graphs and/or the manner in which they should be blended.

In this example, the motion-capture performer 308 may stop spinning in aclockwise motion and perform a new dance move that is recognized as apredefined motion sequence or position by the game engine. For example,the pose depicted in FIG. 17B may result from a predefined motionsequence that is identified by the game engine, or may itself be apredefined pose or position that is identified by the game engine. Theresponse to this identification, the animation sequence for the firstdancer 1704 and the second dancer 1706 can be changed in theirrespective animation graphs such that they perform different dancemoves.

This modification to the game engine solves a technical problem thatexisted in the art. Specifically, when blending human and virtualperformances, the human had to follow the script and timing of thevirtual performance. These embodiments solve this problem by allowingthe human element in the scene to drive the actions of the CGIcharacters. This allows for a more lifelike, fluid, and interactiveexchange between CGI characters and human characters. Furthermore, asdescribed below, this allows audience members or other humanparticipants that are not aware of any scripted performance to interactwith virtual characters through their motions or gestures.

FIG. 18 illustrates a flowchart 1800 of a method for governing virtualanimations by identifying predefined motions/positions of amotion-capture performer, according to some embodiments. The method mayinclude causing a virtual object in the 3-D virtual environment to be ina first animation state in animation graph (1802). The virtual objectmay be a CGI character, a CGI animal, a CGI vehicle, a scenery element,and/or any other virtual object. The virtual object may also be avirtual effect object that is executed in the 3-D virtual scene. In someembodiments, the first animation state may include a state wherein theobject is not yet active or visible in the 3-D virtual scene.

The method may also include receiving a real-time motion or position ofa performer in a first real-world environment (1804). The performer mayinclude a motion-capture performer that is captured by a motion-capturesystem. The motion or position of the performer may also be captured bya plurality of depth cameras. The first real-world environment mayinclude the second performance 302 area described above.

The method may also include identifying the motion or position as apredefined motion or position (1806). Some embodiments may encode themotion or position as a 3-D motion-capture frame comprising verticesand/or wireframe representations of the performer. The 3-D frame can beprovided to a game engine that has been altered to include code thatrecognizes motions of the 3-D frame and/or positions of the 3-D frame bycomparison to a predefined set of motions and/or positions. Theidentified motion and/or position may include hand gestures, bodymovements, dance moves, fighting actions, stunts, acrobatics, and soforth. The identified motion and/or position may also include moresubtle movements, such as finger gestures, head positions and/orrotations, foot movements, or any other position and/or motion that maybe represented by the 3-D frame and/or captured for the performer.Motions, gestures, positions, etc., may also be identified by a speed ofa movement, such as motions characterized as movements that stopmomentum, movements that increase momentum, movements that are performslower than momentum would normally imply, movements that are continuousbut that have low momentum, discontinuous or “jerky” movements,vibrations or jitters, and so forth. Motions or positions may also beidentified by a posture or pose of the motion-capture frame, such as apositioning of the head, neck, tips, chest, arms, and/or legs. Otherposes or postures that may be recognized may include a posture that isrelaxed or balanced over the chest or hips, a folded posture where thechest curves inwards over the hips, and arm positions such as in, out,extended, folded, and so forth. Head gestures can also be recognized,such as head nods, head turns, head shakes, nuzzles, etc.

Additional types of motion performed by the motion-capture frame can berecognized as predefined motions, such as crawling on all fours, layingdown, standing up, sitting down, stretching, and so forth. A widevariety of athletic motions may also be trained and recognized by thegame engine, such as throwing, catching, tackling, jumping, shooting aball, kicking a ball, wrestling moves, and so forth. Interactionsbetween characters may generate identifiable predefined motions, such askicking, hitting, blocking, and so forth. Some embodiments may alsoidentify motions that mimic nonhuman motions, such as flapping acharacter's arms, jumping like an animal, creating “claws” with theirfingers and so forth. These motions can be used particularly when themotion-capture performer is re-skinned to look like an animal or othercreature to trigger virtual motions or effects. Virtually any other typeof movement or position may be trained by the game engine and identifiedto trigger new virtual effects or govern existing virtualobjects/characters. The list above is provided merely by way of exampleand not meant to be limiting.

The method may further include causing the virtual object in the 3-Dvirtual environment to transition to a second animation state in theanimation graph (1810). In some embodiments, the transition may betriggered by identifying the motion or position as the predefined motionor position. The second animation state may include causing the virtualobject to appear or a virtual effect to execute.

It should be appreciated that the specific steps illustrated in FIG. 18provide particular methods of controlling a 3-D virtual environmentusing the motions/positions of a motion-capture performer according tovarious embodiments of the present invention. Other sequences of stepsmay also be performed according to alternative embodiments. For example,alternative embodiments of the present invention may perform the stepsoutlined above in a different order. Moreover, the individual stepsillustrated in FIG. 18 may include multiple sub-steps that may beperformed in various sequences as appropriate to the individual step.Furthermore, additional steps may be added or removed depending on theparticular applications. One of ordinary skill in the art wouldrecognize many variations, modifications, and alternatives.

The embodiments described thus far use a single first performance area102 and/or a single second performance area 302 with a singlemotion-capture performer 308 that triggers virtual assets in the 3-Denvironment 502 and integrates virtual assets in the 3-D environment 502with the real-world elements of the first performance area 102. Theaudience viewing the composited views of the first performance area 102may include a plurality of individuals. However, these embodiments arepresented merely a simplified examples and are not meant to be limiting.Other embodiments may include a large number of audience members, aplurality of motion-capture performers, a plurality of virtual assets,and a plurality of performance areas that can all be integrated togetherusing the 3-D environment 502 to generate composited views. This sharingof physical and virtual environments may be referred to as a “sharedreality” experience or system.

FIG. 19 illustrates one example of a multi-venue distribution of theimmersive content presentation system, according to some embodiments.This example may include a plurality of performance areas, which may bereferred to specifically in this example as a first stage 1901, a secondstage 1902, a third stage (not shown), and so forth. Each of thesestages 1901, 1902 may be equipped with all of the devices describedabove as part of the immersive content presentation system.Specifically, each of these stages 1901, 1902 may be equipped with depthcameras, visible-light image cameras, motion-capture cameras, and soforth. Additionally, each of the stages 1901, 1902 may be presented infront of live audiences 1916, 1918 having a large number of humanparticipants watching the stages 1901, 1902. Each of the stages 1901,1902 may be remotely located from each other, such as located indifferent states or located in different cities. Generally, each of thestages 1901, 1902 may be located in separate physical facilities orbuildings in different geographic locations or separated by at least 0.1miles.

Although not shown explicitly in FIG. 19, each of the stages 1901, 1902may include human performers, props, scenery, and other real-worldobjects that can be re-created as part of the 3-D environment 502. Inthis example, each of the stages 1901, 1902 has been presented as emptyfor the sake of simplicity and clarity. If a human character and/orphysical object is present on the first stage 1901 but not on the secondstage 1902, then the composited view of the second stage 1902 as seen byits audience may include renders of 3-D objects that are inserted intothe 3-D environment 502 based on the human character and/or physicalobject from the first stage 1901. Thus, elements from both of the stages1901, 1902 can be re-created digitally on the other stages such that thecomposited view of each of the stages 1901, 1902 looks the sameregardless of whether they are viewed by the audience 1916 or theaudience 1918.

In addition to the stages 1901, 1902, the immersive content presentationsystem may include one or more second performance areas 302 specificallydesigned for motion-capture performers 308. In this example, themotion-capture performer 308 in the second performance area 302 may actas the performer to be seen on the stages 1901, 1902. For example, themotion-capture performer 302 may be portrayed as an MC, a musician in aband, a comedian, a political candidate, an athlete, and/or any othertype of real-world performer. In some embodiments, the secondperformance area 302 may be equipped with digital screens 1910, 1912 toprovide real-time, real-world video feeds of the audiences 1916, 1918.This may allow the motion-capture performer 308 to view the audiences1916, 1918 and see their reactions to his/her performance. Alternativelyor additionally, the motion-capture performer 308 may wear a pair of ARglasses 1904 and see a composited view of members of the audiences 1916,1918 in front of them such that they may interact with members of theaudiences 1916, 1918 in a realistic fashion as described in greaterdetail below.

FIG. 20 illustrates a composited view of the second stage 1902 with arendered image of the re-skinned motion-capture performer 308, accordingto some embodiments. In this example, the motion-capture performer 308may generate a motion-capture frame 404 in the 3-D environment 502 asdescribed above. The immersive content presentation system can thenre-skin the motion-capture frame 404 with a CGI representation of amagician. The 3-D model of the magician may have its movements andinteractions governed by the movements of the motion-capture frame 404.

In some embodiments, each member of the audience 1918 may have their ownmobile device, such as a smart phone, a pair of VR goggles, a pair of ARglasses, and so forth. In some embodiments, the second stage 1902 mayalso be equipped with a large display screen that displays a singlecomposited view for all of the members of the audience 1918. In thisexample, it can be assumed that each member of the audience 1918 iswearing a pair of AR glasses. Thus, the location and/or orientation ofeach pair of AR glasses can be used to render individual views of the3-D model of the magician using the process described above. A 2-D imageof the magician 2002 may be composited on the AR glasses in front of theview of the second stage 1902. Therefore, each member of the audience1918 wearing the AR glasses would see the image of the magician 2002standing on the second stage 1902 as though they were there in person. Asimilar view may be provided for the audience 1916 of the first stage1901.

As mentioned above, the motion-capture performer 308 may also be wearinga pair of AR glasses 1904. Any objects that are present on the secondstage 1902 can be captured by the depth cameras or preprogrammed intothe 3-D environment 502 and re-created in the second performance area302 by compositing them on the view of the AR glasses 1904 worn by themotion-capture performer 308. This provides a bidirectional, shared,augmented reality experience for both the motion-capture performer 308and the members of the audience 1918. Additionally, it should beunderstood that members of the audience 1916 in front of the first stage1901 would see the same view of the image of the magician 2002composited on their AR glasses and standing on the first stage 1901.

Note that the use of AR glasses is provided merely as one example.Members of the audience 1916 may also view the first stage 1901 througha smart phone screen/camera, through a tablet, and so forth. Also notethat the use of two stages 1901, 1902 is also provided merely as asimplified example. It should be understood that numerous other stagesand audiences can be linked to this single 3-D environment 502 and seethe same or similar interactive content in real-time. In addition tolarge audiences, the performance may also be linked to many smallaudiences, such as single users at home viewing the performance througha pair of VR goggles, on a computer screen, etc.

The motion-capture performer 308 can view the audience 1918 on thedisplay screens 1910, 1912 in the second performance area 302.Alternatively or additionally, the depth cameras, motion-capturecameras, etc., of the second stage 1902 may also have a range thatextends into the audience 1918. Therefore, motions and positions ofvarious members of the audience 1918 can be captured and recreated as3-D models or volumetric objects and place into the 3-D environment 502.This allows the audience 1918 to be re-created in the 3-D environment502 and composited on the AR glasses 1904 of the motion-captureperformer 308. Controls on the AR glasses 1904 and/or on a remotecontrol device can allow the motion-capture performer 308 to toggle backand forth between views of the different audiences 1916, 1918, etc.Thus, the motion-capture performer 308 can interact with differentaudiences in different locations individually during a live performance.

FIG. 21 illustrates how two human performers can control elements of the3-D environment 502 through predetermined motions and/or positions,according to some embodiments. This figure is seen from the view of themotion-capture performer 308 while wearing the AR glasses 1904. Forexample, the motion-capture performer 308 can hold their left armoutstretched in front of them and close their hand into a fist. Thismotion of the corresponding motion-capture frame 404 of themotion-capture performer 308 can be identified by the game engine as apredefined motion and trigger a corresponding action. Raising the leftarm and closing the first may cause the game engine to insert a virtualasset into the 3-D environment 502. In this case, the virtual asset mayinclude a 3-D model of a bouquet of flowers. The game engine can insertthe 3-D model of the bouquet of flowers into the left hand of themotion-capture frame 404 of the motion-capture performer 308. The 3-Dmodel of the bouquet of flowers can remain attached to the left hand ofthe motion-capture frame 404 in the 3-D environment 502 and have itsmovements governed by the physics engine until a subsequent motion orgesture by the motion-capture performer 308 or another performerdictates otherwise.

The AR glasses 1904 worn by the motion-capture performer 308 may rendera 2-D image of the 3-D environment 502 based at least in part on theirphysical location and their corresponding virtual location in the 3-Denvironment 502. This allows the motion-capture performer 308 to viewthe virtual interactions with the audience. FIG. 21 illustrates such aview through the AR glasses 1904 worn by the motion-capture performer308. In this example, the motion-capture performer 308 would view theirown arm with a re-skinned exterior, such as the tuxedo as would be seenby the audience members as described above. Thus, the motion-captureperformer 308 would see their own limbs, etc. as the re-skinnedmotion-capture frame 404 from the 3-D environment 502. Additionally, themotion-capture performer 308 would view a rendered image of the bouquetof flowers 2102 as composited on the AR glasses 1904. Specifically, asthe motion-capture performer 308 raises their left hand and closes theirfist, they would see the rendered image of the bouquet of flowers 2102appear in their left hand.

As described above, the second stage 1902 may be equipped with depthcameras, motion-capture cameras, visible-light cameras, and so forth.When the motion-capture performer calls for a volunteer, an audiencemember from the audience 1918 may walk onto or approach the second stage1902. When this occurs, the depth cameras and/or motion-capture camerasof the second stage 1902 can see the audience member, generate or load a3-D virtual model of the audience member into the 3-D environment 502,and render a view of the audience number 2004 that would be visible tothe motion-capture performer 308 through the AR glasses 1904. Theaudience member may be made to appear as they look in the real-worldenvironment of the second stage 1902, or the audience member may bere-skinned to appear as any other character, creature, and so forth.

This environment creates a shared reality experience that is mixedbetween two different physical venues via the AR glasses worn by theaudience member(s) and the motion-capture performer 308. Elements fromeach physical location may be re-created in a shared 3-D environment502, and rendered views of that shared 3-D environment can be generatedfrom unique positions and/or perspectives of each participant. Forexample, any objects that appear in the second performance area 302(e.g., a table, a magician's hat on the table, other magic props, etc.)can be generated in the 3-D environment 502. These objects would berendered for the composited views of the audience members, but would beleft out of the rendering operation for the view of the motion-captureperformer 308. Similarly, objects on the second stage 1902 can be viewednaturally by the members of the audience 1918 and would be omitted fromtheir rendered views of the shared 3-D environment 502. However theseobjects may be rendered for the composited view for the motion-captureperformer 308.

Because the 3-D environment 502 is shared between all the participants,multiple participants can interact with virtual assets using gestures,motions and positions of their corresponding motion-capture frames. Inthis example, the motion-capture performer 308 is able to hand off thebouquet of flowers to the audience member on the stage. As themotion-capture performer 308 opens their hand, this gesture can berecognized by the game engine to uncouple the 3-D model of the bouquetof flowers from their virtual hand. Similarly, as the motion-captureframe of the audience member raises its hand and closes its fist, thegame engine can recognize this gesture and attach the 3-D model of thebouquet of flowers to the virtual hand of the audience member, providedthat the 3-D model of the bouquet of flowers is within a thresholddistance of the hand of the audience member.

FIG. 22 illustrates a view of the scene from FIG. 21 from theperspective of a member of the audience 1918, according to someembodiments. In this view, the view of the audience member 2202 would beseen from the member of the audience 1918 with their natural eyes. Therendered 2-D view of the magician 2002 and the rendered 2-D view of thebouquet of flowers 2102 would be visible as a composited image on theirAR glasses. Thus it would appear to every member of the audience 1918that the 2-D view of the magician 2002 hands the 2-D view of the bouquetof flowers 2102 to the physical audience member 2202. This view for eachaudience member would be based on their location and/or orientation, anda unique render for each member of the audience 1918 would be providedin each image frame as described above. Again, it should be emphasizedthat although single image frames have been discussed above, each ofthese image frames may often be part of a continuous video sequence thatis composited and viewed on mobile devices, screens, AR glasses, and soforth in real time. Therefore, members of the audience 1918 would notjust see an image of the magician 2002, they would see a continuous,lifelike, video stream of the magician in their composited views.

These figures illustrate just one example of how virtual assets can becreated and/or manipulated based on recognized motion gestures and/orpositions of various performers in different physical locations in ashared 3-D environment 502. Based on this description, it should berecognized that many other types of interactions with virtual assets areenabled by this technology. For example, participants in multiplelocations may participate in a shared fight environment that includesadditional CGI characters that react realistically when punched, kicked,etc. Human participants in multiple locations can engage in magic duelsand attack each other with virtual effects, such as lightning, fire,ice, and other fantastical simulated powers. Participants can changetheir location in a virtual scene in ways that defy normal physics. Forexample, by raising their hands over their head, a participant can causetheir corresponding motion-capture frame to fly upwards in the 3-Denvironment 502. By performing the predefined gesture of snapping herfingers, participants can “warp” their motion-capture frame to differentlocations in the 3-D environment 502. Participants in differentlocations can participate in shared sporting events in the 3-Denvironment 502. For example, players on different physical basketballcourts in different locations can use this technology to play in ashared basketball game, where the basketball itself is a virtual assetthat is governed by the gestures and positions of the players. Bandmembers can rehearse or record with each other while residing indifferent locations. Aside from these examples, this technology may beused to implement any other shared-reality experience where interactionswith virtual assets can be governed by human gestures/positions.

FIG. 23 illustrates a flowchart 2300 of a method for providing ashared-reality experience, according to some embodiments. The method mayinclude receiving a first motion or position of a first performer in afirst real-world environment (2302). The performer may include amotion-capture performer whose movements/position are captured by amotion-capture system. The motion or position of the performer may alsobe captured by a plurality of depth cameras. The first real-worldenvironment may include the second performance 302 area described above.For example, the first real-world environment may include the area wherethe performer depicted as a magician performs in the real world.

The method may also include identifying the first motion or position asa first predefined motion or position (2304). Some embodiments mayencode the motion or position as a 3-D frame comprising vertices and/orwireframe representations of the performer. The 3-D frame may beprovided to a game engine that has been altered to include code thatrecognizes motions of the 3-D frame and/or positions of the 3-D frame bycomparison to a predefined set of motions and/or positions. Theidentified motion and/or position may include hand gestures, bodymovements, dance moves, fighting actions, stunts, acrobatics, and soforth. The identified motion and/or position may also include moresubtle movements, such as finger gestures, head positions and/orrotations, foot movements, or any other position/motion that may berepresented by the 3-D frame and/or captured for the performer. Forexample, the gesture may include raising the left arm and closing thefirst of the motion-capture performer 308 depicted as a magiciandescribed above.

The method may additionally include adding or altering a virtual assetin a 3-D virtual environment in response to identifying the first motionor position as the first predefined motion or position (2306). The 3-Dvirtual environment may model the first real-world environment describedabove, including 3-D models of any props or scenery in the firstreal-world environment. The 3-D virtual environment may also model asecond real-world environment, such as the first performance area 102described above. For example, the second real-world environment mayinclude the first stage 1901 and/or the second stage 1902. The 3-Dvirtual environment may also comprise the 3-D environment 502 describedabove. The 3-D virtual environment may include volumetric objects and/or3-D models of objects that are found in the second real-worldenvironment, such as physical objects, human characters, props, scenery,and/or any other physical characteristics of the second real-worldenvironment. In some embodiments, the 3-D virtual environment may be ashared 3-D virtual environment that allows for the virtual interactionbetween individuals in the first real-world environment and the secondreal-world environment. The 3-D virtual environment may share props,objects, scenery, and so forth, between the different real-worldenvironments.

The virtual asset may include one or more 3-D models of digitalcharacters, digital objects, digital scenery, and other objects that canbe added and/or moved in the 3-D virtual environment. The virtual assetmay also include digital effects, such as flames, ice, weather effects,smoke, fog, explosions, lightning, or any other digital effect. In someembodiments, the virtual asset may include a plurality of virtual assetsof different types. For example, the virtual asset may be the 3-D modelof the bouquet of flowers described above in FIG. 21.

The method may further include receiving a second real-time motion orposition of a second performer in a second real-world environment(2308). The second real-world environment may include the first stage1901, the second stage 1902, or any other type of real-world environmentdescribed above in which motions and/or positions of human performerscan be captured and analyzed. For example, the second performer mayinclude the audience member called to the second stage 1902 in theexample above. The method may also include identifying the secondreal-time motion or position as a second predefined motion or position(2310). The second predefined motion or position may be recognized inthe same or similar manner as the first predefined motion or positionwas recognized. In some instances, the second predefined motion orposition may be the same as the first predefined motion or position. Inother instances, the second predefined motion or position may bedifferent from the first predefined motion or position. The method mayadditionally include altering the virtual asset in the 3-D virtualenvironment in response to identifying the second motion or position asthe second predefined motion or position (1312). In the example above,this may include the “handoff” of the bouquet of flowers between themagician and the audience member. Other examples may include throwing aball back and forth between virtual performers, etc. In someembodiments, the virtual asset may be the same virtual asset for bothperformers, whereas in other embodiments the virtual assets may bedifferent for each performer. For example, performers may generatedifferent virtual effects that qualify as the virtual asset describedabove.

In some embodiments, this method may be augmented using the techniquesdescribed in detail above for creating a real time, composited view ofthe 3-D environment 502 for the audience members in any location thatuses the immersive content presentation system. For example, compositedviews of the first performer, the second performer, and/or the virtualasset may be rendered and provided to mobile devices of the audiencemembers, such as smart phones, AR glasses, VR goggles, or one or morelarge display screens.

Another technical problem existed prior to this disclosure that madethis type of shared-reality experience very difficult to implement.Specifically, this technical problem related to the transmission andpackaging of communications to maintain a real-time stream of databetween all of the various devices in the immersive content presentationsystem. As described above in relation to FIGS. 14A-14C, there are manydifferent configurations of servers, environments, cameras, sensors,mobile computing devices, displays, VR goggles, AR glasses, etc., thatmay be used in the immersive content presentation system. Because ofthis diversity of device types, different types of data may need to betransmitted between different devices. The type and amount of data maydepend on the capabilities of each device. For example, the type andamount of data transmitted between devices may depend on where imagesare rendered, where images are composited, where the 3-D environment isstored in manipulated, where the game engine detects motion-capturegestures and/or positions, and so forth. Existing communicationprotocols were not tailored to this shared-reality experience, and thusresulted in communication packets that were bloated, unnecessary, andinsufficient, and which in turn led to an increase of bandwidth usageacross wireless networks used by the immersive content presentationsystem. This increased bandwidth can interfere with the ability tocapture, render, composite, and display images in real time for liveaudiences.

To overcome this and other technical problems, an improvement tocommunication technology is described in this disclosure. Specifically,a bandwidth-efficient and data-efficient protocol has been designed thatis adaptable for any shared-reality device configurations. As describedin the sections above, the immersive content presentation system maytransmit live video streams, rendered images, composited images, virtualasset identifiers, and 3-D transformations to be applied to virtualassets in a 3-D environment. The communication protocol described belowaccommodates each of these different types of data and transmits them inthe most memory-efficient and bandwidth-efficient manner. Thisefficiency allows the immersive content presentation system to performin real time for large audiences using large numbers of display devices.For example, this protocol can allow stadiums or large sports venues toprovide real-time, rendered, composited views of a shared realityexperience involving thousands of display devices simultaneously. At thesame time, this protocol also works efficiently in a home theaterenvironment or other small-venue implementations of the immersivecontent presentation system.

FIG. 24 illustrates a diagram of the communication protocol, accordingto some embodiments. A live video stream can be broken up intoindividual frames that are recorded and/or presented at interactiveframe rates (e.g., 10 fps, 20 fps, less than 1 second delay, etc.). Foreach frame, the communication protocol can prepare a singlecommunication packet. Each single communication packet can be configuredto store up to a complete set of information required to provide acomposited view of a single frame in the live video stream. In someembodiments, each communication packet may include at least two fields.A first field 2402 can be configured to represent a 2-D image 2406. Forexample, the 2-D image 2406 may depict at least part of a view of areal-world environment. Alternatively, the 2-D image 2406 may depict arendered view of at least part of the 3-D virtual environment. The 2-Dimage 2406 may be derived from a live video stream from a visible-lightcamera, such as camera 106 in the first performance area 102. Individualframes from this video stream can be packaged in the first field 2402 ofeach packet in the communication protocol. Each of these frames can beused as a background for a compositing operation for generating acomposite view that is displayed on the display device as illustrated inthe examples above.

In some configurations, the 2-D image 2406 may include a rendered viewof the 3-D environment 502 from a perspective of a virtual camera in the3-D environment 502. When being transmitted to a display device, therendered view of the 3-D environment 502 may be rendered from aperspective corresponding to the location of the display device relativeto a performance area, such as the first performance area 102. In theseconfigurations, multiple communication packets can be transmitted todifferent devices for each frame, where each frame is rendered from adifferent perspective. This allows for a central server to performmultiple rendering operations and transmit images to various displaydevices. In some embodiments, the rendered view may include cutouts thatprovide a 3-D perspective for virtual objects being superimposed on liveimages. Therefore, the rendered view may include rendered views ofvirtual assets that are not present in a live view of a performancearea. Alternatively, the rendered view may be a complete rendering ofthe 3-D environment.

In some configurations, the 2-D image 2406 may include a compositedimage. The composited image may include a background layer comprised ofa frame from a live video feed from a camera. The composited image mayalso include a foreground or second layer comprised of rendered portionsof the 3-D environment 502. This configuration may be suitable fortransmitting images that are ready for display on a large display screenor any other viewpoint-agnostic display method.

In some implementations, the first field 2402 may be left unpopulated.Therefore, a field that is configured to represent a 2-D image may alsorepresent a blank image or null image. This may be useful forconfigurations where the rendering and/or compositing operations areperformed at the individual display devices. This may also be useful forconfigurations where the display devices capture the live video feedupon which the rendered images will be composite, such as with smartphones, AR glasses, and/or the like.

In addition to the first field that is configured to represent a 2-Dimage, the communication protocol may also include a second field 2404that is configured to represent one or more 3-D transforms that can beapplied to virtual assets in the 3-D virtual environment. As describedabove, the 3-D environment 502 may include any number of virtual assets.The communication protocol accommodates this by allowing the secondfield 2404 to include any number of components, each of which maycontain a transform for a specific virtual asset in the 3-D environment.Each transform may include a matrix or other mathematical expression2410 that represents the transform to be applied to the virtual asset.Additionally, each transfer may be associated with a unique identifier2408 that identifies the virtual asset in the 3-D environment to whichthe transform should be applied.

Any type of matrix or mathematical expression 2410 can be used torepresent a transform. Common geometric transformations that are used incomputer graphics may include rotation, scaling, shearing, reflection,orthogonal projection, translation, and so forth. These may generally berepresented by matrices that can be multiplied with coordinate values ofdifferent points on the virtual asset to change the location of thesecoordinates and apply the transformation to the virtual asset. Asdescribed above, these transformations may result from the game engineidentifying a motion or position of a motion-capture performer as apredefined motion or position that triggers the transform to be appliedto one or more virtual assets in the 3-D environment.

By sending 3-D transforms in each communication packet, this allows theimmersive content presentation system to allow different compositedviews to be remotely rendered in a distributed fashion rather thanrequiring that they all take place at the same server. Previously, asingle version of the 3-D environment was stored at the server, and theserver was required to render each view of the 3-D environment from thedifferent perspectives of the display devices, such as AR glasses. Whenthe number of display devices grows beyond the capability of the serverto simultaneously render different views of the 3-D environment,real-time display of composited views for each display device was nolonger possible. In these embodiments, the 3-D virtual environment canbe transmitted a single time to each of the devices in the immersivecontent presentation system. Therefore, multiple devices maysimultaneously store a representation of the 3-D environment. This mayinclude 3-D objects representing live props, motion-capture frames,different “skins” that may be applied to motion-capture frames, virtualeffects, 3-D models of CGI characters, and so forth.

To keep the different versions of the 3-D environment up to date on eachof the distributed devices, the communication protocol described hereintransmits transforms from the server to all of the devices in theimmersive content presentation system that have copies of the 3-Denvironment. This allows each receiving device to individually apply thetransforms to the 3-D environment. Generally, applying these transformsin real time is computationally inexpensive and can be easily performedby even lightweight processing systems. This communication protocol isvery advantageous because transmitting the 3-D transforms requires onlya fraction of the bandwidth that would be required to otherwise transmitthe entirety of the 3-D environment with each frame, which wouldotherwise be required for distributed rendering.

As was the case with the first field 2402, the second field 2404 mayinclude null or empty data. For example, in cases where the renderingoperation is performed centrally and the rendered images are transmittedto the various display devices, it may not be necessary to transmit any3-D transforms to any display devices. Therefore, the second field 2404may be null/zero or left unpopulated for these configurations.

In some embodiments, each communication packet in the communicationprotocol may also include a timestamp 2412, a location code 2414, and/ora spatial anchor point that identifies a common coordinate origin foreach of the distributed devices. Because people see the same virtualobjects in the same real-world locations, distributed configurations mayrequire a shared understanding of spatial anchors. In some embodiments,a visual fiducial (e.g., an image, a pattern, a printed page, or anyother distinguishable visible item) can be placed at an origin locationin each real-world environment. Each device when viewing theirrespective real-world environment can determine a position andorientation of the visual fiducial and store this anchor points as anoffset. They can then be prepended as necessary to camera transformsthat are sent between the device and the server.

FIG. 25 illustrates a transmission using the communication protocol thatsupports remote rendering, according to some embodiments. In thisexample, a representation of the 3-D virtual scene 1414 can bedistributed to the server 1430 and to a plurality of display devices2502, 2504. Each communication packet may include a first field 2402 anda second field 2404 as described above. The first field 2402 may includeimage frames from a live video stream of a performance area, such as thefirst performance area 102. For example, this may include image framesthat include the human actor 108, the table 110, the chair 112, and soforth. Individual frames from this video stream may be transmitted toeach of the display devices 2502, 2504 to be used as a background layerfor compositing rendered images for the final composited view.

The second field 2404 in the packets transmitted by the communicationprotocol may include 3-D transforms that may be applied to the 3-Dvirtual scene 1414 by each of the display devices 2502, 2504. Forexample, each of the display devices 2502, 2504 may include a renderingengine 1418 and a compositing engine 1420. The display device is 2502,2504 can then apply the 3-D transforms to the 3-D virtual scene 1414,render individual views of the 3-D virtual scene 1414, composite viewsof the 3-D virtual scene 1414 on top of the image frames received in thefirst field 2402 of the communication protocol, and provide thecomposited view for display.

FIG. 26 illustrates a transmission using the communication protocol thatsupports centralized rendering, according to some embodiments. In thisconfiguration, the display devices 2502, 2504 need not require anyrendering engine 1418 or compositing engine 1420. Instead, the server1430 can perform all of the rendering operations centrally. Thus, thefirst field 2402 can include a composite view generated by rendering the3-D virtual scene 1414 and compositing that scene on a real-time view ofa performance area. Each communication packet in the communicationprotocol can include a 2-D image from the composited video stream.However, because the 3-D virtual scene 1414 has been fully rendered, andexists only on the server 1430, there is no need to transmit any 3-Dtransforms in the second field 2404 of the communication packet. Thisembodiment may be used for configurations where the display devices 2504are not required or able to perform heavy compositing/renderingoperations.

FIG. 27 illustrates a transmission using the communication protocol thatsupports remote rendering/compositing and image capture, according tosome embodiments. In this configuration, each of the display devices2502, 2504 may include a copy of the 3-D virtual scene 1414. Norendering needs to take place at the server 1430. Additionally, each ofthe display devices 2502, 2504 may include a camera or other means ofseeing a live view of the performance area. For example, the displaydevices 2502, 2504 may include a camera that displays a real-time imageon a screen, or may include AR glasses that provide a natural view ofthe performance area. Because each of the display devices 2502, 2504capture or create their own 2-D images for a background compositinglayer, there is no need to transmit any 2-D images in the first fields2402 of the communication packet. Therefore, it can include a blankimage or can be set to null/zero memory to preserve bandwidth. However,the second field 2404 may continue to transmit 3-D transforms such thatthe 3-D virtual scene 1414 can maintain consistency across all of thedevices.

FIG. 28 illustrates a transmission using the communication protocol thatsupports central rendering and remote compositing, according to someembodiments. In this implementation, the 3-D virtual scene 1414 onlyneeds to exist at the server with the rendering engine 1418. Asdescribed above, the server 1430 can render a 2-D image for the firstfield 2402 of the communication packet that includes cutouts of therendered objects such that the 2-D image is ready for compositing. Thedisplay devices 2502, 2504 can receive the rendered image and compositethe image with a live image, such as an image captured by a phone cameraor an image visible through AR glasses.

FIG. 29 illustrates a flowchart 2900 of a method for using acommunication protocol that efficiently shares information in ashared-reality immersive content presentation system, according to someembodiments. The method may include receiving changes to a virtual assetin a 3-D virtual environment (2902). The 3-D virtual environment may beshared between a plurality of real-world environments, such as amotion-capture area, a performance area, an audience, and so forth. Themethod may also include preparing a stream of real-time communicationpackets (2904). The packets may include a first field that is configuredto represent a 2-D image depicting at least in part a view of a firstreal-world environment in the plurality of real-world environments or aview of the 3-D virtual environment. The packets may also include asecond field that is configured to represent one or more 3-D transformsto be applied to the virtual asset in the 3-D virtual environment. Themethod may also include transmitting the stream of real-timecommunication packets to a plurality of display devices distributed inthe plurality of real-world environments (2906). The transmittedcommunication packets can be used to build or present composited viewsas described in detail above.

It should be appreciated that the specific steps illustrated in FIG. 29provide particular methods of using a communication protocol accordingto various embodiments of the present invention. Other sequences ofsteps may also be performed according to alternative embodiments. Forexample, alternative embodiments of the present invention may performthe steps outlined above in a different order. Moreover, the individualsteps illustrated in FIG. 29 may include multiple sub-steps that may beperformed in various sequences as appropriate to the individual step.Furthermore, additional steps may be added or removed depending on theparticular applications. One of ordinary skill in the art wouldrecognize many variations, modifications, and alternatives.

Each of the methods described herein may be implemented by a computersystem. Each step of these methods may be executed automatically by thecomputer system, and/or may be provided with inputs/outputs involving auser. For example, a user may provide inputs for each step in a method,and each of these inputs may be in response to a specific outputrequesting such an input, wherein the output is generated by thecomputer system. Each input may be received in response to acorresponding requesting output. Furthermore, inputs may be receivedfrom a user, from another computer system as a data stream, retrievedfrom a memory location, retrieved over a network, requested from a webservice, and/or the like. Likewise, outputs may be provided to a user,to another computer system as a data stream, saved in a memory location,sent over a network, provided to a web service, and/or the like. Inshort, each step of the methods described herein may be performed by acomputer system, and may involve any number of inputs, outputs, and/orrequests to and from the computer system which may or may not involve auser. Those steps not involving a user may be said to be performedautomatically by the computer system without human intervention.Therefore, it will be understood in light of this disclosure, that eachstep of each method described herein may be altered to include an inputand output to and from a user, or may be done automatically by acomputer system without human intervention where any determinations aremade by a processor. Furthermore, some embodiments of each of themethods described herein may be implemented as a set of instructionsstored on a tangible, non-transitory storage medium to form a tangiblesoftware product.

FIG. 30 illustrates a computer system 3000, in which various embodimentsdescribed herein may be implemented. The system 3000 may be used toimplement any of the computer systems described above. As shown in thefigure, computer system 3000 includes a processing unit 3004 thatcommunicates with a number of peripheral subsystems via a bus subsystem3002. These peripheral subsystems may include a processing accelerationunit 3006, an I/O subsystem 3008, a storage subsystem 3018 and acommunications subsystem 3024. Storage subsystem 3018 includes tangiblecomputer-readable storage media 3022 and a system memory 3010.

Bus subsystem 3002 provides a mechanism for letting the variouscomponents and subsystems of computer system 3000 communicate with eachother as intended. Although bus subsystem 3002 is shown schematically asa single bus, alternative embodiments of the bus subsystem may utilizemultiple buses. Bus subsystem 3002 may be any of several types of busstructures including a memory bus or memory controller, a peripheralbus, and a local bus using any of a variety of bus architectures. Forexample, such architectures may include an Industry StandardArchitecture (ISA) bus, Micro Channel Architecture (MCA) bus, EnhancedISA (EISA) bus, Video Electronics Standards Association (VESA) localbus, and Peripheral Component Interconnect (PCI) bus, which can beimplemented as a Mezzanine bus manufactured to the IEEE P1386.1standard.

Processing unit 3004, which can be implemented as one or more integratedcircuits (e.g., a conventional microprocessor or microcontroller),controls the operation of computer system 3000. One or more processorsmay be included in processing unit 3004. These processors may includesingle core or multicore processors. In certain embodiments, processingunit 3004 may be implemented as one or more independent processing units3032 and/or 3034 with single or multicore processors included in eachprocessing unit. In other embodiments, processing unit 3004 may also beimplemented as a quad-core processing unit formed by integrating twodual-core processors into a single chip.

In various embodiments, processing unit 3004 can execute a variety ofprograms in response to program code and can maintain multipleconcurrently executing programs or processes. At any given time, some orall of the program code to be executed can be resident in processor(s)3004 and/or in storage subsystem 3018. Through suitable programming,processor(s) 3004 can provide various functionalities described above.Computer system 3000 may additionally include a processing accelerationunit 3006, which can include a digital signal processor (DSP), aspecial-purpose processor, and/or the like.

I/O subsystem 3008 may include user interface input devices and userinterface output devices. User interface input devices may include akeyboard, pointing devices such as a mouse or trackball, a touchpad ortouch screen incorporated into a display, a scroll wheel, a click wheel,a dial, a button, a switch, a keypad, audio input devices with voicecommand recognition systems, microphones, and other types of inputdevices. User interface input devices may include, for example, motionsensing and/or gesture recognition devices such as the Microsoft Kinect®motion sensor that enables users to control and interact with an inputdevice, such as the Microsoft Xbox® 360 game controller, through anatural user interface using gestures and spoken commands. Userinterface input devices may also include eye gesture recognition devicessuch as the Google Glass® blink detector that detects eye activity(e.g., ‘blinking’ while taking pictures and/or making a menu selection)from users and transforms the eye gestures as input into an input device(e.g., Google Glass®). Additionally, user interface input devices mayinclude voice recognition sensing devices that enable users to interactwith voice recognition systems (e.g., Siri® navigator), through voicecommands.

User interface input devices may also include, without limitation, threedimensional (3-D) mice, joysticks or pointing sticks, gamepads andgraphic tablets, and audio/visual devices such as speakers, digitalcameras, digital camcorders, portable media players, webcams, imagescanners, fingerprint scanners, barcode reader 3-D scanners, 3-Dprinters, laser rangefinders, and eye gaze monitoring devices.Additionally, user interface input devices may include, for example,medical imaging input devices such as computed tomography, magneticresonance imaging, position emission tomography, medical ultrasonographydevices. User interface input devices may also include, for example,audio input devices such as MIDI keyboards, digital musical instrumentsand the like.

User interface output devices may include a display subsystem, indicatorlights, or non-visual displays such as audio output devices, etc. Thedisplay subsystem may be a cathode ray tube (CRT), a flat-panel device,such as that using a liquid crystal display (LCD) or plasma display, aprojection device, a touch screen, and the like. In general, use of theterm “output device” is intended to include all possible types ofdevices and mechanisms for outputting information from computer system3000 to a user or other computer. For example, user interface outputdevices may include, without limitation, a variety of display devicesthat visually convey text, graphics and audio/video information such asmonitors, printers, speakers, headphones, automotive navigation systems,plotters, voice output devices, and modems.

Computer system 3000 may comprise a storage subsystem 3018 thatcomprises software elements, shown as being currently located within asystem memory 3010. System memory 3010 may store program instructionsthat are loadable and executable on processing unit 3004, as well asdata generated during the execution of these programs.

Depending on the configuration and type of computer system 3000, systemmemory 3010 may be volatile (such as random access memory (RAM)) and/ornon-volatile (such as read-only memory (ROM), flash memory, etc.) TheRAM typically contains data and/or program modules that are immediatelyaccessible to and/or presently being operated and executed by processingunit 3004. In some implementations, system memory 3010 may includemultiple different types of memory, such as static random access memory(SRAM) or dynamic random access memory (DRAM). In some implementations,a basic input/output system (BIOS), containing the basic routines thathelp to transfer information between elements within computer system3000, such as during start-up, may typically be stored in the ROM. Byway of example, and not limitation, system memory 3010 also illustratesapplication programs 3012, which may include client applications, Webbrowsers, mid-tier applications, relational database management systems(RDBMS), etc., program data 3014, and an operating system 3016. By wayof example, operating system 3016 may include various versions ofMicrosoft Windows®, Apple Macintosh®, and/or Linux operating systems, avariety of commercially-available UNIX® or UNIX-like operating systems(including without limitation the variety of GNU/Linux operatingsystems, the Google Chrome® OS, and the like) and/or mobile operatingsystems such as iOS, Windows® Phone, Android® OS, BlackBerry® 10 OS, andPalm® OS operating systems.

Storage subsystem 3018 may also provide a tangible computer-readablestorage medium for storing the basic programming and data constructsthat provide the functionality of some embodiments. Software (programs,code modules, instructions) that when executed by a processor providethe functionality described above may be stored in storage subsystem3018. These software modules or instructions may be executed byprocessing unit 3004. Storage subsystem 3018 may also provide arepository for storing data used in accordance with the presentinvention.

Storage subsystem 3000 may also include a computer-readable storagemedia reader 3020 that can further be connected to computer-readablestorage media 3022. Together and, optionally, in combination with systemmemory 3010, computer-readable storage media 3022 may comprehensivelyrepresent remote, local, fixed, and/or removable storage devices plusstorage media for temporarily and/or more permanently containing,storing, transmitting, and retrieving computer-readable information.

Computer-readable storage media 3022 containing code, or portions ofcode, can also include any appropriate media known or used in the art,including storage media and communication media, such as but not limitedto, volatile and non-volatile, removable and non-removable mediaimplemented in any method or technology for storage and/or transmissionof information. This can include tangible computer-readable storagemedia such as RAM, ROM, electronically erasable programmable ROM(EEPROM), flash memory or other memory technology, CD-ROM, digitalversatile disk (DVD), or other optical storage, magnetic cassettes,magnetic tape, magnetic disk storage or other magnetic storage devices,or other tangible computer readable media. This can also includenontangible computer-readable media, such as data signals, datatransmissions, or any other medium which can be used to transmit thedesired information and which can be accessed by computing system 3000.

By way of example, computer-readable storage media 3022 may include ahard disk drive that reads from or writes to non-removable, nonvolatilemagnetic media, a magnetic disk drive that reads from or writes to aremovable, nonvolatile magnetic disk, and an optical disk drive thatreads from or writes to a removable, nonvolatile optical disk such as aCD ROM, DVD, and Blu-Ray® disk, or other optical media.Computer-readable storage media 3022 may include, but is not limited to,Zip® drives, flash memory cards, universal serial bus (USB) flashdrives, secure digital (SD) cards, DVD disks, digital video tape, andthe like. Computer-readable storage media 3022 may also include,solid-state drives (SSD) based on non-volatile memory such asflash-memory based SSDs, enterprise flash drives, solid state ROM, andthe like, SSDs based on volatile memory such as solid state RAM, dynamicRAM, static RAM, DRAM-based SSDs, magnetoresistive RAM (MRAM) SSDs, andhybrid SSDs that use a combination of DRAM and flash memory based SSDs.The disk drives and their associated computer-readable media may providenon-volatile storage of computer-readable instructions, data structures,program modules, and other data for computer system 3000.

Communications subsystem 3024 provides an interface to other computersystems and networks. Communications subsystem 3024 serves as aninterface for receiving data from and transmitting data to other systemsfrom computer system 3000. For example, communications subsystem 3024may enable computer system 3000 to connect to one or more devices viathe Internet. In some embodiments communications subsystem 3024 caninclude radio frequency (RF) transceiver components for accessingwireless voice and/or data networks (e.g., using cellular telephonetechnology, advanced data network technology, such as 3G, 4G or EDGE(enhanced data rates for global evolution), WiFi (IEEE 802.11 familystandards, or other mobile communication technologies, or anycombination thereof), global positioning system (GPS) receivercomponents, and/or other components. In some embodiments communicationssubsystem 3024 can provide wired network connectivity (e.g., Ethernet)in addition to or instead of a wireless interface.

In some embodiments, communications subsystem 3024 may also receiveinput communication in the form of structured and/or unstructured datafeeds 3026, event streams 3028, event updates 3030, and the like onbehalf of one or more users who may use computer system 3000.

By way of example, communications subsystem 3024 may be configured toreceive data feeds 3026 in real-time from users of social networksand/or other communication services such as Twitter® feeds, Facebook®updates, web feeds such as Rich Site Summary (RSS) feeds, and/orreal-time updates from one or more third party information sources.

Additionally, communications subsystem 3024 may also be configured toreceive data in the form of continuous data streams, which may includeevent streams 3028 of real-time events and/or event updates 3030, thatmay be continuous or unbounded in nature with no explicit end. Examplesof applications that generate continuous data may include, for example,sensor data applications, financial tickers, network performancemeasuring tools (e.g. network monitoring and traffic managementapplications), clickstream analysis tools, automobile trafficmonitoring, and the like.

Communications subsystem 3024 may also be configured to output thestructured and/or unstructured data feeds 3026, event streams 3028,event updates 3030, and the like to one or more databases that may be incommunication with one or more streaming data source computers coupledto computer system 3000.

Computer system 3000 can be one of various types, including a handheldportable device (e.g., an iPhone® cellular phone, an iPad® computingtablet, a PDA), a wearable device (e.g., a Google Glass® head mounteddisplay), a PC, a workstation, a mainframe, a kiosk, a server rack, orany other data processing system.

Due to the ever-changing nature of computers and networks, thedescription of computer system 3000 depicted in the figure is intendedonly as a specific example. Many other configurations having more orfewer components than the system depicted in the figure are possible.For example, customized hardware might also be used and/or particularelements might be implemented in hardware, firmware, software (includingapplets), or a combination. Further, connection to other computingdevices, such as network input/output devices, may be employed. Based onthe disclosure and teachings provided herein, a person of ordinary skillin the art will appreciate other ways and/or methods to implement thevarious embodiments.

In the foregoing description, for the purposes of explanation, numerousspecific details were set forth in order to provide a thoroughunderstanding of various embodiments of the present invention. It willbe apparent, however, to one skilled in the art that embodiments of thepresent invention may be practiced without some of these specificdetails. In other instances, well-known structures and devices are shownin block diagram form.

The foregoing description provides exemplary embodiments only, and isnot intended to limit the scope, applicability, or configuration of thedisclosure. Rather, the foregoing description of the exemplaryembodiments will provide those skilled in the art with an enablingdescription for implementing an exemplary embodiment. It should beunderstood that various changes may be made in the function andarrangement of elements without departing from the spirit and scope ofthe invention as set forth in the appended claims.

Specific details are given in the foregoing description to provide athorough understanding of the embodiments. However, it will beunderstood by one of ordinary skill in the art that the embodiments maybe practiced without these specific details. For example, circuits,systems, networks, processes, and other components may have been shownas components in block diagram form in order not to obscure theembodiments in unnecessary detail. In other instances, well-knowncircuits, processes, algorithms, structures, and techniques may havebeen shown without unnecessary detail in order to avoid obscuring theembodiments.

Also, it is noted that individual embodiments may have been described asa process which is depicted as a flowchart, a flow diagram, a data flowdiagram, a structure diagram, or a block diagram. Although a flowchartmay have described the operations as a sequential process, many of theoperations can be performed in parallel or concurrently. In addition,the order of the operations may be re-arranged. A process is terminatedwhen its operations are completed, but could have additional steps notincluded in a figure. A process may correspond to a method, a function,a procedure, a subroutine, a subprogram, etc. When a process correspondsto a function, its termination can correspond to a return of thefunction to the calling function or the main function.

The term “computer-readable medium” includes, but is not limited toportable or fixed storage devices, optical storage devices, wirelesschannels and various other mediums capable of storing, containing, orcarrying instruction(s) and/or data. A code segment ormachine-executable instructions may represent a procedure, a function, asubprogram, a program, a routine, a subroutine, a module, a softwarepackage, a class, or any combination of instructions, data structures,or program statements. A code segment may be coupled to another codesegment or a hardware circuit by passing and/or receiving information,data, arguments, parameters, or memory contents. Information, arguments,parameters, data, etc., may be passed, forwarded, or transmitted via anysuitable means including memory sharing, message passing, token passing,network transmission, etc.

Furthermore, embodiments may be implemented by hardware, software,firmware, middleware, microcode, hardware description languages, or anycombination thereof. When implemented in software, firmware, middlewareor microcode, the program code or code segments to perform the necessarytasks may be stored in a machine readable medium. A processor(s) mayperform the necessary tasks.

In the foregoing specification, aspects of the invention are describedwith reference to specific embodiments thereof, but those skilled in theart will recognize that the invention is not limited thereto. Variousfeatures and aspects of the above-described invention may be usedindividually or jointly. Further, embodiments can be utilized in anynumber of environments and applications beyond those described hereinwithout departing from the broader spirit and scope of thespecification. The specification and drawings are, accordingly, to beregarded as illustrative rather than restrictive.

Additionally, for the purposes of illustration, methods were describedin a particular order. It should be appreciated that in alternateembodiments, the methods may be performed in a different order than thatdescribed. It should also be appreciated that the methods describedabove may be performed by hardware components or may be embodied insequences of machine-executable instructions, which may be used to causea machine, such as a general-purpose or special-purpose processor orlogic circuits programmed with the instructions to perform the methods.These machine-executable instructions may be stored on one or moremachine readable mediums, such as CD-ROMs or other type of opticaldisks, floppy diskettes, ROMs, RAMs, EPROMs, EEPROMs, magnetic oroptical cards, flash memory, or other types of machine-readable mediumssuitable for storing electronic instructions. Alternatively, the methodsmay be performed by a combination of hardware and software.

What is claimed is:
 1. A method comprising: receiving a first motion orposition of a first performer in a first real-world environment;identifying the first motion or position as a first predefined motion orposition; altering a virtual representation of the first performer in a3-D virtual environment based on the first motion or position of thefirst performer in the first real-world environment; altering a virtualasset in the 3-D virtual environment in response to identifying thefirst motion or position as the first predefined motion or position;receiving a second real-time motion or position of a second performer ina second real-world environment; identifying the second real-time motionor position as a second predefined motion or position; altering thevirtual asset in the 3-D virtual environment in response to identifyingthe second real-time motion or position as the second predefined motionor position; rendering a first 2-D video stream of the virtualrepresentation of the first performer and the virtual asset in the 3-Dvirtual environment; and compositing the first 2-D video stream with alive view of the second performer in the second real-world environmenton an augmented reality device such that the virtual representation ofthe first performer exchanges possession of the virtual asset with thesecond performer such that they all appear to be live in the secondreal-world environment when the second real-world environment is viewedthrough the augmented reality device.
 2. The method of claim 1, furthercomprising: rendering a 2-D video stream of the virtual asset in the 3-Dvirtual environment wherein: the 2-D video stream of the virtual assetincludes the altering of the virtual asset in response to identifyingthe first motion or position; and the 2-D video stream of the virtualasset includes the altering of the virtual asset in response toidentifying the second motion or position.
 3. The method of claim 2,further comprising: causing a real-time video stream to be displayed ona display device, wherein: the real-time video stream comprises the 2-Dvideo stream of the virtual asset; and the 2-D video stream iscomposited with a real-time view of the second real-world environment.4. The method of claim 1, wherein the first real-world environment isphysically separated from the second real-world environment.
 5. Themethod of claim 1, wherein the first performer is not visible to thesecond performer in the first real-world environment.
 6. The method ofclaim 5, wherein the first performer is visible to the second performerthrough the 3-D virtual environment.
 7. The method of claim 1, whereinthe first performer is equipped with a pair of AR glasses.
 8. Animmersive content presentation system comprising: one or moreprocessors; and one or more memory devices comprising instructions that,when executed by the one or more processors, cause the one or moreprocessors to perform operations comprising: receiving a first motion orposition of a first performer in a first real-world environment;identifying the first motion or position as a first predefined motion orposition; altering a virtual representation of the first performer in a3-D virtual environment based on the first motion or position of thefirst performer in the first real-world environment; altering a virtualasset in the 3-D virtual environment in response to identifying thefirst motion or position as the first predefined motion or position;receiving a second real-time motion or position of a second performer ina second real-world environment; identifying the second real-time motionor position as a second predefined motion or position; altering thevirtual asset in the 3-D virtual environment in response to identifyingthe real-time second motion or position as the second predefined motionor position; rendering a first 2-D video stream of the virtualrepresentation of the first performer and the virtual asset in the 3-Dvirtual environment; and compositing the first 2-D video stream with alive view of the second performer in the second real-world environmenton an augmented reality device such that the virtual representation ofthe first performer exchanges possession of the virtual asset with thesecond performer such that they all appear to be live in the secondreal-world environment when the second real-world environment is viewedthrough the augmented reality device.
 9. The immersive contentpresentation system of claim 8, wherein altering the virtual asset inthe 3-D virtual environment in response to identifying the first motionor position as the first predefined motion or position comprises:causing the virtual asset to be held by virtual representation of thefirst performer in the 3-D virtual environment.
 10. The immersivecontent presentation system of claim 9, wherein altering the virtualasset in the 3-D virtual environment in response to identifying thesecond motion or position as the second predefined motion or positioncomprises: causing the virtual asset to be passed from the virtualrepresentation of the first performer in the 3-D virtual environment toa virtual representation of the second performer in the 3-D virtualenvironment.
 11. The immersive content presentation system of claim 8,wherein the first predefined motion or position comprises the firstperformer closing their hand.
 12. The immersive content presentationsystem of claim 8, wherein the first predefined motion or positioncomprises the first performer pointing their arm or hand at a virtualrepresentation of an object.
 13. The immersive content presentationsystem of claim 8, wherein the first predefined motion or positioncomprises the first performer executing a throwing motion.
 14. Theimmersive content presentation system of claim 13, wherein the firstpredefined motion or position comprises the second performer executing acatching motion.
 15. A non-transitory, computer-readable mediumcomprising instructions that, when executed by one or more processors,cause the one or more processors to perform operations comprising:receiving a first motion or position of a first performer in a firstreal-world environment; identifying the first motion or position as afirst predefined motion or position; altering a virtual representationof the first performer in a 3-D virtual environment based on the firstmotion or position of the first performer in the first real-worldenvironment; altering a virtual asset in the 3-D virtual environment inresponse to identifying the first motion or position as the firstpredefined motion or position; receiving a second real-time motion orposition of a second performer in a second real-world environment;identifying the second real-time motion or position as a secondpredefined motion or position; altering the virtual asset in the 3-Dvirtual environment in response to identifying the second real-timemotion or position as the second predefined motion or position;rendering a first 2-D video stream of the virtual representation of thefirst performer and the virtual asset in the 3-D virtual environment;and compositing the first 2-D video stream with a live view of thesecond performer in the second real-world environment on an augmentedreality device such that the virtual representation of the firstperformer exchanges possession of the virtual asset with the secondperformer such that they all appear to be live in the second real-worldenvironment when the second real-world environment is viewed through theaugmented reality device.
 16. The non-transitory, computer-readablemedium of claim 15, wherein the virtual asset comprises a digitaleffect.
 17. The non-transitory, computer-readable medium of claim 15,wherein the virtual asset comprises a CGI character.
 18. Thenon-transitory, computer-readable medium of claim 15, wherein the firstperformer in the first real-world environment comprises a motion-capturesuit that is recorded with a plurality of motion-capture cameras in amotion-capture system.
 19. The non-transitory, computer-readable mediumof claim 15, wherein identifying the first motion or position as thefirst predefined motion or position comprises: receiving amotion-capture frame at a game engine; and comparing the first motion orposition to a plurality of motions or positions in a predefined libraryof motions or positions.
 20. The non-transitory, computer-readablemedium of claim 19, wherein identifying the first motion or position asthe first predefined motion or position comprises: comparing calculatedmotion vectors of vertices on the motion-capture frame with thepredefined library of motions or positions.